SlideShare a Scribd company logo
1 of 43
Download to read offline
Digital Preservation
  The Saga Continues
SCAPE




http://www.dpconline.org/
SCAPE

‘Digital Preservation: what I
wish I knew before I started’
 • It won’t do itself

 • It won’t go away

 • Don’t wait for perfection
SCAPE

Digital preservation makes bleak reading …
SCAPE
Let’s restate the problem …
•Digital stuff has value. It is an asset.
•It has potential and creates new opportunities.
•Use gives rise to direct and indirect outcomes.
...but...

•Deployment depends on software, hardware and people.
•Software, hardware and people change.
...therefore...

•Access is not guaranteed without (some) action
•Value, opportunity, impact not guaranteed
SCAPE
Key responses
1. Migration
Changing the format of a file to ensure
the information content can be read
2. Emulation
Intervening in the operating system to
ensure that old software can function
and information content can be read
3. Hardware preservation
Maintaining access to data and processes
by maintaining the physical computing
environment including hardware and
peripherals.
4. etc
Research and development field, new
solutions and new approaches continue to
emerge, eg virtualisation for preservation
SCAPE

Access and long term use
depends on the
configuration of hardware
and software and the
capacity of the operator.

Change is not a bug.
SCAPE


Technology continues to
change creating the
conditions for obsolescence.

Need to become a learning
institution
SCAPE


Storage media have a short life
and storage devices are subject
to obsolescence.

Be mobile and format neutral
SCAPE

Digital preservation systems
are subject to the same
obsolescence as the objects
they safeguard.

Standards and modularity
SCAPE

Digital resources are intolerant
of gaps in preservation.

Ongoing process
SCAPE

The problems are more
subtle than we realised a
decade ago…

e.g. file format
obsolescence

Changing file formats?
Conformant containers?
Units of information?
SCAPE

How to pick a winner ...

beyond and potentially over-writing the criteria ...
repository managers should align the recognition and
weighting of criteria with a clear preservation strategy
that articulates the purpose of the repository and the
needs of its designated community;



Todd, M 2009 ‘File formats for preservation’, DPC Technology Watch Report
02/09, online at http://www.dpconline.org/advice/technology-watch-
reports.html
SCAPE
How to pick a winner
...
                       You ain’t seen nothing yet

                       Data growth on 3 axes
                       •volume
                       •complexity
                       •expectation


 ... it’s not going to be about obsolescence so
 much as workflow and capacity
SCAPE


Digital Preservation as a ‘discipline’

                                  Daunting challenge
                                  Decade of research and development
                                  Replete with jargon and acronyms
                                  Turf war between professions?
                                  A whole new barrier
      Courtesy NASA/JPL-Caltech




The last decade has shown definitively that using
fancy words are not the same as solving problems
SCAPE

How much does it cost?
                    Lifecycle costs of digital objects
                    vs
                    Lifecycle costs of books
                    vs
                    Lifecycle costs of museum objects
                    vs
                    Lifecycle costs of archives
                    vs
                    Lifecycles costs of historic environment
SCAPE
Ge= ng$
      started…$




                  www.dpconline.org-
SCAPE
The reality?



You don’t need to understand
or do all of this.


... and it doesn’t all have to exist at the same time
SCAPE


The reality?



Get started now
not later
SCAPE
        Preservation Lifecycle

Identification            DROID, FIDO, FILE, FITS, TIKA…



       Characterisation            JHOVE, JPYLYZER, exiftool, FITS…



                 Risk Assessment       Knowledge + Policy + Risk = Continue



                            Planning            Plato



                                     Action             Migration, Emulation
SCAPE
             This Training
Identification



           DROID            FITS
           FIDO             TIKA
           FILE

                 Characterisation



                 JHOVE,             Exiftool
                 JPYLYZER           FITS
So you have dug a hole?
Stage 2

• What did you find?

• Is it worth preserving?

• What are the problems?
Aim of Training
SCAPE
            Time to get married
1.   Luis Bravo               1.   Jose Carvalho (SDUM)
2.   Jose Casanova            2.   Carlos Duarte
3.   Vitor Fernandes          3.   Luis Ferreira
4.   Sebastien Leroux         4.   Cristiana Freitas
5.   Joao Pereira             5.   Claire Johnson
6.   Rui Rodrigues            6.   Anthony Laerdahl
7.   Carlos Velentim          7.   Helena Medeiros
8.   Jose Carvalho (Papiro)   8.   Antonio Rodrigues
9.   Omar Coelho              9.   Cidalia Ferreira
        Column 1                    Column 2
SCAPE
              Getting Started (1)
•   Wifi Network = SMS, password = Sarmento1881127
•   Download Virtualbox (if you don’t have it)
•   Start Virtualbox
•   Plug-in USB memory key
•   Open the memory key folder and double click the extension
    pack file to install it (follow instructions at this point)
•   Return to virtual box:
•   From the main menu (file), select “Import Appliance”
•   Browse to the memory key and select the only file
•   Wait for this to import
•   Once done you can safely remove the key.
SCAPE
          Getting Started (2)
• Once done, click the machine and press the
  settings button (maybe in right click)
• Click shared folders
• Click add
• Add a shared folder (e.g. your desktop or
  downloads folder)
• Tick auto-mount!
• Click OK to return to the main screen
• Start the machine
• Wait..
SCAPE
         Getting Started (3)
• Password is training.
• Ignore update manager if it appears
• Press the top left ubuntu home button and
  type terminal (select and run the app)
• Type: cd /media/sf_Desktop (where Desktop is
  the folder you shared previously) and press
  enter
• Type: fido *
SCAPE
                Bundle or Not?
• Pros                    • Cons
  – Single Input/Output     – Out of date
  – Consistent              – Doesn’t Scale
  – Easy
SCAPE
             Questions (1)
• What tool would you use?
SCAPE
           Training


  Keeping Control - Scalable
Environments for Identification
     and Characterisation
SCAPE
                     Aims
This training course will cover elements dealing
  with scalable identification, characterisation
and validation of large collections of varying file
 types. Users will be introduced to a number of
 tools designed for each of these purposes and
 involved in problem solving scenarios. Further,
   users will be required to evaluate the use of
    scalable and cloud based technologies in
    developing solutions for given scenarios.
SCAPE
        Learning Outcomes (1)
• Distinguish between different file types and
  identify the requirements for characterising
  each.

• Carry out a number of identification,
  characterisation, and duplication detection
  experiments on example files.
SCAPE
        Learning Outcomes (2)
• Critically evaluate characterisation and
  identification tools and assess their
  advantages and disadvantages when used in
  different scenarios.
Learning Outcomes (3)


• Conduct an in-depth analysis of large volumes
  of identification and characterisation data and
  find representative sample records suitable for
  preservation planning experiments.
SCAPE
         Learning Outcomes (4)
• Compare and contrast the differences in
  running characterisation and identification
  tools both stand-alone and within workflows.

• Envisage a system that combines workflows
  with identification, characterisation and
  validation tools to suit a variety of scenarios.
SCAPE
     Our Last Commitment


Slides will be available Monday!
SCAPE
Thank You


     Franz San Galli
SCAPE
Thank-You
SCAPE
Thank You
SCAPE
           Next Time…


Building Applications Infrastructures
         for Action Services

     London, September 2013
              (wet)
SCAPE
               Then….


Critical Path: Effective Evidence Based
         Preservation Planning

     Denmark, November 2013
              (cold)
SCAPE
                    Tonight
   www.goo.gl/q6wKB
7:15pm Eleven Bar
@ Hotel Fundador

   Free Beer*


       Our Table
                * 1 Free Beer subject to completion of online survey!

More Related Content

Viewers also liked

Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...
Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...
Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...stepheneisenhauer
 
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...JISC KeepIt project
 
Preservation content in_files
Preservation content in_filesPreservation content in_files
Preservation content in_filesRichard Wright
 
Perkembangan pi
Perkembangan piPerkembangan pi
Perkembangan pimut4676
 
UCSB Data Management Training for Librarians: Introduction to the DMPTool
UCSB Data Management Training for Librarians: Introduction to the DMPToolUCSB Data Management Training for Librarians: Introduction to the DMPTool
UCSB Data Management Training for Librarians: Introduction to the DMPToolStephanie Simms
 
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Artefactual Systems - Archivematica
 
The 'Digital Object Types' Issue
The 'Digital Object Types' IssueThe 'Digital Object Types' Issue
The 'Digital Object Types' IssueAndy Jackson
 
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...DeVonne Parks, CEM
 
Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing videoprwheatley
 

Viewers also liked (10)

Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...
Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...
Digging into File Formats: Poking around at data using file, DROID, JHOVE, an...
 
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...
Supporting Significant Properties in a Working Archive (SPs part 5), by Steph...
 
Preservation content in_files
Preservation content in_filesPreservation content in_files
Preservation content in_files
 
Perkembangan pi
Perkembangan piPerkembangan pi
Perkembangan pi
 
UCSB Data Management Training for Librarians: Introduction to the DMPTool
UCSB Data Management Training for Librarians: Introduction to the DMPToolUCSB Data Management Training for Librarians: Introduction to the DMPTool
UCSB Data Management Training for Librarians: Introduction to the DMPTool
 
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
 
The 'Digital Object Types' Issue
The 'Digital Object Types' IssueThe 'Digital Object Types' Issue
The 'Digital Object Types' Issue
 
Your Digital Preservation Cookbook
Your Digital Preservation CookbookYour Digital Preservation Cookbook
Your Digital Preservation Cookbook
 
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
 
Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing video
 

Similar to Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012

Reverse engineering
Reverse engineeringReverse engineering
Reverse engineeringSaswat Padhi
 
Understanding Research 2.0 from a Socio-technical Perspective
Understanding Research 2.0 from a Socio-technical PerspectiveUnderstanding Research 2.0 from a Socio-technical Perspective
Understanding Research 2.0 from a Socio-technical PerspectiveYuwei Lin
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Maurice Nsabimana
 
DuraCloud Integration with SDSC Presentation Slides
DuraCloud Integration with SDSC Presentation SlidesDuraCloud Integration with SDSC Presentation Slides
DuraCloud Integration with SDSC Presentation SlidesDuraSpace
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startupDzung Nguyen
 
What to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based ArtWhat to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based Artneilgrindley
 
BSC-Microsoft Research Center Overview
BSC-Microsoft Research Center OverviewBSC-Microsoft Research Center Overview
BSC-Microsoft Research Center OverviewMICProductivity
 
Technology Driven Differentiated Instruction #KCDTTL
Technology Driven Differentiated Instruction #KCDTTLTechnology Driven Differentiated Instruction #KCDTTL
Technology Driven Differentiated Instruction #KCDTTLVicki Davis
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...Beniamino Murgante
 
The Palm PAL Project: A Digital Invitation to Join the Millennium
The Palm PAL Project: A Digital Invitation to Join the Millennium The Palm PAL Project: A Digital Invitation to Join the Millennium
The Palm PAL Project: A Digital Invitation to Join the Millennium Leslie Dare
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...SCAPE Project
 
Evaluation of format identification tools
Evaluation of format identification toolsEvaluation of format identification tools
Evaluation of format identification toolsSCAPE Project
 
Open experiments and open-source
Open experiments and open-sourceOpen experiments and open-source
Open experiments and open-sourcepeircej
 

Similar to Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012 (20)

Preserve or preserve not
Preserve or preserve notPreserve or preserve not
Preserve or preserve not
 
Reverse engineering
Reverse engineeringReverse engineering
Reverse engineering
 
Understanding Research 2.0 from a Socio-technical Perspective
Understanding Research 2.0 from a Socio-technical PerspectiveUnderstanding Research 2.0 from a Socio-technical Perspective
Understanding Research 2.0 from a Socio-technical Perspective
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
DuraCloud Integration with SDSC Presentation Slides
DuraCloud Integration with SDSC Presentation SlidesDuraCloud Integration with SDSC Presentation Slides
DuraCloud Integration with SDSC Presentation Slides
 
Technologies for startup
Technologies for startupTechnologies for startup
Technologies for startup
 
What to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based ArtWhat to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based Art
 
BSC-Microsoft Research Center Overview
BSC-Microsoft Research Center OverviewBSC-Microsoft Research Center Overview
BSC-Microsoft Research Center Overview
 
Technology Driven Differentiated Instruction #KCDTTL
Technology Driven Differentiated Instruction #KCDTTLTechnology Driven Differentiated Instruction #KCDTTL
Technology Driven Differentiated Instruction #KCDTTL
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
 
The Palm PAL Project: A Digital Invitation to Join the Millennium
The Palm PAL Project: A Digital Invitation to Join the Millennium The Palm PAL Project: A Digital Invitation to Join the Millennium
The Palm PAL Project: A Digital Invitation to Join the Millennium
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...
 
Evaluation of format identification tools
Evaluation of format identification toolsEvaluation of format identification tools
Evaluation of format identification tools
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
Open experiments and open-source
Open experiments and open-sourceOpen experiments and open-source
Open experiments and open-source
 
SDLC Smashup
SDLC SmashupSDLC Smashup
SDLC Smashup
 

More from SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation WatchSCAPE Project
 

More from SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012

  • 1. Digital Preservation The Saga Continues
  • 3. SCAPE ‘Digital Preservation: what I wish I knew before I started’ • It won’t do itself • It won’t go away • Don’t wait for perfection
  • 5. SCAPE Let’s restate the problem … •Digital stuff has value. It is an asset. •It has potential and creates new opportunities. •Use gives rise to direct and indirect outcomes. ...but... •Deployment depends on software, hardware and people. •Software, hardware and people change. ...therefore... •Access is not guaranteed without (some) action •Value, opportunity, impact not guaranteed
  • 6. SCAPE Key responses 1. Migration Changing the format of a file to ensure the information content can be read 2. Emulation Intervening in the operating system to ensure that old software can function and information content can be read 3. Hardware preservation Maintaining access to data and processes by maintaining the physical computing environment including hardware and peripherals. 4. etc Research and development field, new solutions and new approaches continue to emerge, eg virtualisation for preservation
  • 7. SCAPE Access and long term use depends on the configuration of hardware and software and the capacity of the operator. Change is not a bug.
  • 8. SCAPE Technology continues to change creating the conditions for obsolescence. Need to become a learning institution
  • 9. SCAPE Storage media have a short life and storage devices are subject to obsolescence. Be mobile and format neutral
  • 10. SCAPE Digital preservation systems are subject to the same obsolescence as the objects they safeguard. Standards and modularity
  • 11. SCAPE Digital resources are intolerant of gaps in preservation. Ongoing process
  • 12. SCAPE The problems are more subtle than we realised a decade ago… e.g. file format obsolescence Changing file formats? Conformant containers? Units of information?
  • 13. SCAPE How to pick a winner ... beyond and potentially over-writing the criteria ... repository managers should align the recognition and weighting of criteria with a clear preservation strategy that articulates the purpose of the repository and the needs of its designated community; Todd, M 2009 ‘File formats for preservation’, DPC Technology Watch Report 02/09, online at http://www.dpconline.org/advice/technology-watch- reports.html
  • 14. SCAPE How to pick a winner ... You ain’t seen nothing yet Data growth on 3 axes •volume •complexity •expectation ... it’s not going to be about obsolescence so much as workflow and capacity
  • 15. SCAPE Digital Preservation as a ‘discipline’ Daunting challenge Decade of research and development Replete with jargon and acronyms Turf war between professions? A whole new barrier Courtesy NASA/JPL-Caltech The last decade has shown definitively that using fancy words are not the same as solving problems
  • 16. SCAPE How much does it cost? Lifecycle costs of digital objects vs Lifecycle costs of books vs Lifecycle costs of museum objects vs Lifecycle costs of archives vs Lifecycles costs of historic environment
  • 17. SCAPE Ge= ng$ started…$ www.dpconline.org-
  • 18. SCAPE The reality? You don’t need to understand or do all of this. ... and it doesn’t all have to exist at the same time
  • 20. SCAPE Preservation Lifecycle Identification DROID, FIDO, FILE, FITS, TIKA… Characterisation JHOVE, JPYLYZER, exiftool, FITS… Risk Assessment Knowledge + Policy + Risk = Continue Planning Plato Action Migration, Emulation
  • 21. SCAPE This Training Identification DROID FITS FIDO TIKA FILE Characterisation JHOVE, Exiftool JPYLYZER FITS
  • 22. So you have dug a hole?
  • 23. Stage 2 • What did you find? • Is it worth preserving? • What are the problems?
  • 25. SCAPE Time to get married 1. Luis Bravo 1. Jose Carvalho (SDUM) 2. Jose Casanova 2. Carlos Duarte 3. Vitor Fernandes 3. Luis Ferreira 4. Sebastien Leroux 4. Cristiana Freitas 5. Joao Pereira 5. Claire Johnson 6. Rui Rodrigues 6. Anthony Laerdahl 7. Carlos Velentim 7. Helena Medeiros 8. Jose Carvalho (Papiro) 8. Antonio Rodrigues 9. Omar Coelho 9. Cidalia Ferreira Column 1 Column 2
  • 26. SCAPE Getting Started (1) • Wifi Network = SMS, password = Sarmento1881127 • Download Virtualbox (if you don’t have it) • Start Virtualbox • Plug-in USB memory key • Open the memory key folder and double click the extension pack file to install it (follow instructions at this point) • Return to virtual box: • From the main menu (file), select “Import Appliance” • Browse to the memory key and select the only file • Wait for this to import • Once done you can safely remove the key.
  • 27. SCAPE Getting Started (2) • Once done, click the machine and press the settings button (maybe in right click) • Click shared folders • Click add • Add a shared folder (e.g. your desktop or downloads folder) • Tick auto-mount! • Click OK to return to the main screen • Start the machine • Wait..
  • 28. SCAPE Getting Started (3) • Password is training. • Ignore update manager if it appears • Press the top left ubuntu home button and type terminal (select and run the app) • Type: cd /media/sf_Desktop (where Desktop is the folder you shared previously) and press enter • Type: fido *
  • 29. SCAPE Bundle or Not? • Pros • Cons – Single Input/Output – Out of date – Consistent – Doesn’t Scale – Easy
  • 30. SCAPE Questions (1) • What tool would you use?
  • 31. SCAPE Training Keeping Control - Scalable Environments for Identification and Characterisation
  • 32. SCAPE Aims This training course will cover elements dealing with scalable identification, characterisation and validation of large collections of varying file types. Users will be introduced to a number of tools designed for each of these purposes and involved in problem solving scenarios. Further, users will be required to evaluate the use of scalable and cloud based technologies in developing solutions for given scenarios.
  • 33. SCAPE Learning Outcomes (1) • Distinguish between different file types and identify the requirements for characterising each. • Carry out a number of identification, characterisation, and duplication detection experiments on example files.
  • 34. SCAPE Learning Outcomes (2) • Critically evaluate characterisation and identification tools and assess their advantages and disadvantages when used in different scenarios.
  • 35. Learning Outcomes (3) • Conduct an in-depth analysis of large volumes of identification and characterisation data and find representative sample records suitable for preservation planning experiments.
  • 36. SCAPE Learning Outcomes (4) • Compare and contrast the differences in running characterisation and identification tools both stand-alone and within workflows. • Envisage a system that combines workflows with identification, characterisation and validation tools to suit a variety of scenarios.
  • 37. SCAPE Our Last Commitment Slides will be available Monday!
  • 38. SCAPE Thank You Franz San Galli
  • 41. SCAPE Next Time… Building Applications Infrastructures for Action Services London, September 2013 (wet)
  • 42. SCAPE Then…. Critical Path: Effective Evidence Based Preservation Planning Denmark, November 2013 (cold)
  • 43. SCAPE Tonight www.goo.gl/q6wKB 7:15pm Eleven Bar @ Hotel Fundador Free Beer* Our Table * 1 Free Beer subject to completion of online survey!