SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Luis	
  Faria	
  lfaria@keep.pt	
  
KEEP	
  SOLUTIONS	
  www.keep-­‐solu=ons.com
SCAPE	
  webminar	
  
July	
  26,	
  2014
Tools	
  for	
  uncovering	
  preserva=on	
  
risks	
  in	
  your	
  large	
  repositories
Repository
Format obsolescence
Emerging technology
Consumer trends
New standards
Organisation
mission
Bit rot
Resource capability
System availability
Security breach
Economical limitations Social and political factors
Producer trends
Organisation
policies
2
Why do we need monitoring?
Repository
Format obsolescence
Emerging technology
Consumer trends
New standards
Organisation
mission
Bit rot
Resource capability
System availability
Security breach
Economical limitations Social and political factors
Producer trends
Organisation
policies
3
Why do we need monitoring?
Risks
Opportunities
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
4
5.41%&
0.77%&
1.54%&
1.93%&
2.32%&
2.70%&
2.70%&
5.02%&
7.34%&
9.27%&
15.83%&
26.64%&
28.57%&
0.00%& 5.00%& 10.00%& 15.00%& 20.00%& 25.00%& 30.00%&
Other&
Data&intensive&industry&
Non&affiliated&
Big&data&science&
Digital&preservaDon&vendor&
Research&funder&
Large&enterprise&
Publisher&or&content&producer&
Small&or&medium&enterprise&
Local&government&insDtuDon&
NaDonal&government&insDtuDon&
Memory&insDtuDon&or&content&holder&
University&
What%descrip-ons%fit%your%organiza-on?%
Preserva'on	
  monitoring	
  survey
181 valid	
  
par=cipants
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Preserva'on	
  monitoring	
  survey
5
92%$
89%$
78%$
77%$
76%$
76%$
75%$
74%$
69%$
68%$
64%$
41%$
51%$
41%$
40%$
44%$
23%$
27%$
17%$
28%$
25%$
30%$
18%$
9%$
18%$
13%$
12%$
24%$
22%$
25%$
25%$
19%$
23%$
41%$
40%$
41%$
46%$
44%$
53%$
51%$
58%$
47%$
55%$
46%$
0.00%$ 10.00%$ 20.00%$ 30.00%$ 40.00%$ 50.00%$ 60.00%$ 70.00%$ 80.00%$ 90.00%$ 100.00%$
File$corrup7on$
Backup$failure$
Staff$not$enough$or$adequate$
SoDware$plaForm$obsolescence$
Hardware$plaForm$obsolescence$
Lack$of$context$informa7on$
Incorrect$ac7on$results$
Consumers$misalignment$
Outdated$preserva7on$plans$
Producers$misalignment$
Content$not$aligned$with$policies$
Importance$(normalized$mean)$ Monitoring$ Not$monitoring$ Uncertain$or$No$answer$
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
6
Tools	
  for	
  uncovering	
  preserva'on	
  risks
Content FITS C3PO Scout
FITS	
  output	
  	
  
(XML)
</>
File	
  characteris=cs	
  distribu=on	
  
(graphs	
  and	
  drill-­‐down	
  analysis)
File	
  and	
  world	
  proper=es	
  	
  
throughout	
  =me	
  and	
  no=fica=ons
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
• hp://fitstool.org	
  
• Characteriza=on	
  
• Iden=fica=on	
  
• Feature	
  extrac=on	
  
• Valida=on	
  
• Support	
  for:	
  
• DROID	
  
• JHove	
  
• Apache	
  Tika	
  
• ADL	
  Tool	
  
• Exidool	
  
• FFIdent	
  
• File	
  U=lity	
  (windows	
  port)	
  
• NLNZ	
  Metadata	
  Extractor	
  
• OIS	
  Audio,	
  File	
  and	
  XML	
  Informa=on
FITS	
  -­‐	
  File	
  Informa'on	
  Tool	
  Set
• hps://github.com/keeps/fits/tree/keeps	
  
• Developed	
  by	
  KEEPS	
  
• Added	
  support	
  for:	
  
• FIDO	
  
• Microsod	
  Office	
  
• Adobe	
  Illustrator	
  
• Corel	
  Draw	
  
• Email	
  (EML)	
  
• Autocad	
  (DWG)	
  
• Shapefile	
  
• RTF,	
  TXT	
  
• Databases	
  (DBML)
7
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
FITS	
  -­‐	
  File	
  Informa'on	
  Tool	
  Set
• Demonstra=on	
  
• Download	
  from	
  hp://fitstool.org	
  
!
• Execute	
  for	
  a	
  file	
  
!
!
• Execute	
  for	
  a	
  directory
8
./fits.sh	
  -­‐i	
  file.png
./fits.sh	
  -­‐r	
  -­‐i	
  source_directory/	
  -­‐o	
  output_directory/
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
FITS	
  performance
• hps://github.com/keeps/fits-­‐tes=ng	
  
• 3	
  to	
  6	
  seconds	
  per	
  file	
  
• 12	
  TB	
  -­‐	
  A	
  year	
  	
  
• hp://www.openplanetsfounda=on.org/blogs/2013-­‐01-­‐09-­‐year-­‐fits	
  
• Other	
  op=ons	
  for	
  scalability:	
  
• Fido	
  
• Apache	
  Tika	
  
• Nanite
9
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
C3PO	
  -­‐	
  Clever,	
  Cra?y	
  Content	
  Profile	
  of	
  Objects
• hp://ifs.tuwien.ac.at/imp/c3po	
  
• Web	
  applica=on	
  
• Content	
  characteris=cs	
  aggrega=on	
  	
  
• Drill-­‐down	
  analysis
10
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
C3PO	
  install
• Download	
  binaries	
  at:	
  
• hp://dl.bintray.com/peshkira/c3po/	
  
• Install	
  mongodb:	
  
• hp://www.mongodb.org/	
  
• Install	
  Apache	
  Tomcat	
  
• hp://tomcat.apache.org/	
  
• Put	
  C3PO	
  web	
  app	
  in	
  Apache	
  Tomcat	
  
• Remove	
  ROOT	
  dir	
  for	
  webapps	
  and	
  rename	
  C3PO	
  web	
  app	
  to	
  ROOT.war	
  
• Start	
  Apache	
  Tomcat	
  and	
  connect	
  to:	
  
• hp://localhost:8080/	
  
• Usage	
  guide:	
  
• hps://github.com/peshkira/c3po/wiki/Usage-­‐Guide
11
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
C3PO	
  performance
Dataset:	
  Statsbiblioteket	
  (Denmark)	
  
• Size:	
  440M	
  files	
  (12	
  TB)	
  
• Process	
  =me:	
  388h	
  (16	
  days)	
  /	
  50h	
  for	
  XML	
  report	
  
• Average	
  =me:	
  2.5s	
  per	
  1000	
  files	
  
• Web	
  applica=on	
  has	
  2.5	
  million	
  FITS	
  files	
  limit	
  
12
Scout:	
  a	
  preserva'on	
  watch	
  system
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Monitors	
  aspects	
  of	
  the	
  world	
  to	
  detect	
  preserva=on	
  risks	
  and	
  opportuni=es
13
Content
Policies
Web
Scout
Risk notification
Human
knowledge
Registries
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
14
Information Sources
• Format registries & software catalogues
• Digital repositories & web archives
• Organizational objectives
• Experiments
• Simulation
• Human knowledge
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
15
Current information sources
• Repository content and events
• SCAPE Policy model
• PRONOM
• Web semantic extraction
• Web page renderability experiments
16
Define triggers
• Notify me when there are tools that can render the
format X.
17
Define triggers
Simple query with templates
18
Receive
notifications
Email
HTTP Push API
There	
  are	
  tools	
  that	
  can	
  render	
  format	
  X.
19
Interfaces
Web page
REST API
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
How to be a part of Scout
• Checkout
• Site: http://openplanets.github.io/scout/
• Report: http://www.scape-project.eu/deliverable/d12-2-
final-version-of-the-preservation-watch-component
• Demo: http://scout.scape.keep.pt
• Integrate your content
• Contribute with information (soon)
• Use Scout form for manual input of knowledge
20
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.	
  
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Roadmap
• User	
  support	
  
• More	
  trigger	
  templates	
  
• More	
  adaptors	
  
• KrakeN	
  /	
  Propminer	
  	
  
• Sodware	
  catalogues	
  
• Other	
  format	
  registries	
  
• Other	
  experiments	
  informa=on	
  sources	
  
• Manual	
  input	
  (human	
  knowledge)	
  
• Simula=on
21
Luis	
  Faria	
  lfaria@keep.pt	
  
KEEP	
  SOLUTIONS	
  www.keep-­‐solu=ons.com
SCAPE	
  webminar	
  
July	
  26,	
  2014
Tools	
  for	
  uncovering	
  preserva=on	
  
risks	
  in	
  large	
  repositories

Contenu connexe

Similaire à Tools for Uncovering Preservation Risks in Large Repositories

Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibrarySven Schlarb
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE Project
 
Automatic Preservation Watch Using Information Extraction on the Web
Automatic Preservation Watch Using Information Extraction on the WebAutomatic Preservation Watch Using Information Extraction on the Web
Automatic Preservation Watch Using Information Extraction on the WebLuis Faria
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation WatchSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
Update on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification EngineUpdate on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification EngineStuart Myles
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspaperscneudecker
 
Barbara Sierman: Policy levels in SCAPE
Barbara Sierman: Policy levels in SCAPEBarbara Sierman: Policy levels in SCAPE
Barbara Sierman: Policy levels in SCAPEBarbara Sierman
 
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Artefactual Systems - AtoM
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
 
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)Pedro Príncipe
 

Similaire à Tools for Uncovering Preservation Risks in Large Repositories (20)

Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National Library
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation Environments
 
Automatic Preservation Watch Using Information Extraction on the Web
Automatic Preservation Watch Using Information Extraction on the WebAutomatic Preservation Watch Using Information Extraction on the Web
Automatic Preservation Watch Using Information Extraction on the Web
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Update on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification EngineUpdate on IPTC's EXTRA Open Source Classification Engine
Update on IPTC's EXTRA Open Source Classification Engine
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
Barbara Sierman: Policy levels in SCAPE
Barbara Sierman: Policy levels in SCAPEBarbara Sierman: Policy levels in SCAPE
Barbara Sierman: Policy levels in SCAPE
 
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
 
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
 

Plus de SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPESCAPE Project
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000SCAPE Project
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation SCAPE Project
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPESCAPE Project
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...SCAPE Project
 
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012SCAPE Project
 

Plus de SCAPE Project (19)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPE
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...
 
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
 

Dernier

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Dernier (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Tools for Uncovering Preservation Risks in Large Repositories

  • 1. Luis  Faria  lfaria@keep.pt   KEEP  SOLUTIONS  www.keep-­‐solu=ons.com SCAPE  webminar   July  26,  2014 Tools  for  uncovering  preserva=on   risks  in  your  large  repositories
  • 2. Repository Format obsolescence Emerging technology Consumer trends New standards Organisation mission Bit rot Resource capability System availability Security breach Economical limitations Social and political factors Producer trends Organisation policies 2 Why do we need monitoring?
  • 3. Repository Format obsolescence Emerging technology Consumer trends New standards Organisation mission Bit rot Resource capability System availability Security breach Economical limitations Social and political factors Producer trends Organisation policies 3 Why do we need monitoring? Risks Opportunities
  • 4. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 4 5.41%& 0.77%& 1.54%& 1.93%& 2.32%& 2.70%& 2.70%& 5.02%& 7.34%& 9.27%& 15.83%& 26.64%& 28.57%& 0.00%& 5.00%& 10.00%& 15.00%& 20.00%& 25.00%& 30.00%& Other& Data&intensive&industry& Non&affiliated& Big&data&science& Digital&preservaDon&vendor& Research&funder& Large&enterprise& Publisher&or&content&producer& Small&or&medium&enterprise& Local&government&insDtuDon& NaDonal&government&insDtuDon& Memory&insDtuDon&or&content&holder& University& What%descrip-ons%fit%your%organiza-on?% Preserva'on  monitoring  survey 181 valid   par=cipants
  • 5. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Preserva'on  monitoring  survey 5 92%$ 89%$ 78%$ 77%$ 76%$ 76%$ 75%$ 74%$ 69%$ 68%$ 64%$ 41%$ 51%$ 41%$ 40%$ 44%$ 23%$ 27%$ 17%$ 28%$ 25%$ 30%$ 18%$ 9%$ 18%$ 13%$ 12%$ 24%$ 22%$ 25%$ 25%$ 19%$ 23%$ 41%$ 40%$ 41%$ 46%$ 44%$ 53%$ 51%$ 58%$ 47%$ 55%$ 46%$ 0.00%$ 10.00%$ 20.00%$ 30.00%$ 40.00%$ 50.00%$ 60.00%$ 70.00%$ 80.00%$ 90.00%$ 100.00%$ File$corrup7on$ Backup$failure$ Staff$not$enough$or$adequate$ SoDware$plaForm$obsolescence$ Hardware$plaForm$obsolescence$ Lack$of$context$informa7on$ Incorrect$ac7on$results$ Consumers$misalignment$ Outdated$preserva7on$plans$ Producers$misalignment$ Content$not$aligned$with$policies$ Importance$(normalized$mean)$ Monitoring$ Not$monitoring$ Uncertain$or$No$answer$
  • 6. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 6 Tools  for  uncovering  preserva'on  risks Content FITS C3PO Scout FITS  output     (XML) </> File  characteris=cs  distribu=on   (graphs  and  drill-­‐down  analysis) File  and  world  proper=es     throughout  =me  and  no=fica=ons
  • 7. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). • hp://fitstool.org   • Characteriza=on   • Iden=fica=on   • Feature  extrac=on   • Valida=on   • Support  for:   • DROID   • JHove   • Apache  Tika   • ADL  Tool   • Exidool   • FFIdent   • File  U=lity  (windows  port)   • NLNZ  Metadata  Extractor   • OIS  Audio,  File  and  XML  Informa=on FITS  -­‐  File  Informa'on  Tool  Set • hps://github.com/keeps/fits/tree/keeps   • Developed  by  KEEPS   • Added  support  for:   • FIDO   • Microsod  Office   • Adobe  Illustrator   • Corel  Draw   • Email  (EML)   • Autocad  (DWG)   • Shapefile   • RTF,  TXT   • Databases  (DBML) 7
  • 8. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). FITS  -­‐  File  Informa'on  Tool  Set • Demonstra=on   • Download  from  hp://fitstool.org   ! • Execute  for  a  file   ! ! • Execute  for  a  directory 8 ./fits.sh  -­‐i  file.png ./fits.sh  -­‐r  -­‐i  source_directory/  -­‐o  output_directory/
  • 9. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). FITS  performance • hps://github.com/keeps/fits-­‐tes=ng   • 3  to  6  seconds  per  file   • 12  TB  -­‐  A  year     • hp://www.openplanetsfounda=on.org/blogs/2013-­‐01-­‐09-­‐year-­‐fits   • Other  op=ons  for  scalability:   • Fido   • Apache  Tika   • Nanite 9
  • 10. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). C3PO  -­‐  Clever,  Cra?y  Content  Profile  of  Objects • hp://ifs.tuwien.ac.at/imp/c3po   • Web  applica=on   • Content  characteris=cs  aggrega=on     • Drill-­‐down  analysis 10
  • 11. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). C3PO  install • Download  binaries  at:   • hp://dl.bintray.com/peshkira/c3po/   • Install  mongodb:   • hp://www.mongodb.org/   • Install  Apache  Tomcat   • hp://tomcat.apache.org/   • Put  C3PO  web  app  in  Apache  Tomcat   • Remove  ROOT  dir  for  webapps  and  rename  C3PO  web  app  to  ROOT.war   • Start  Apache  Tomcat  and  connect  to:   • hp://localhost:8080/   • Usage  guide:   • hps://github.com/peshkira/c3po/wiki/Usage-­‐Guide 11
  • 12. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). C3PO  performance Dataset:  Statsbiblioteket  (Denmark)   • Size:  440M  files  (12  TB)   • Process  =me:  388h  (16  days)  /  50h  for  XML  report   • Average  =me:  2.5s  per  1000  files   • Web  applica=on  has  2.5  million  FITS  files  limit   12
  • 13. Scout:  a  preserva'on  watch  system This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Monitors  aspects  of  the  world  to  detect  preserva=on  risks  and  opportuni=es 13 Content Policies Web Scout Risk notification Human knowledge Registries
  • 14. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 14 Information Sources • Format registries & software catalogues • Digital repositories & web archives • Organizational objectives • Experiments • Simulation • Human knowledge
  • 15. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 15 Current information sources • Repository content and events • SCAPE Policy model • PRONOM • Web semantic extraction • Web page renderability experiments
  • 16. 16 Define triggers • Notify me when there are tools that can render the format X.
  • 18. 18 Receive notifications Email HTTP Push API There  are  tools  that  can  render  format  X.
  • 20. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). How to be a part of Scout • Checkout • Site: http://openplanets.github.io/scout/ • Report: http://www.scape-project.eu/deliverable/d12-2- final-version-of-the-preservation-watch-component • Demo: http://scout.scape.keep.pt • Integrate your content • Contribute with information (soon) • Use Scout form for manual input of knowledge 20
  • 21. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Roadmap • User  support   • More  trigger  templates   • More  adaptors   • KrakeN  /  Propminer     • Sodware  catalogues   • Other  format  registries   • Other  experiments  informa=on  sources   • Manual  input  (human  knowledge)   • Simula=on 21
  • 22. Luis  Faria  lfaria@keep.pt   KEEP  SOLUTIONS  www.keep-­‐solu=ons.com SCAPE  webminar   July  26,  2014 Tools  for  uncovering  preserva=on   risks  in  large  repositories