SlideShare a Scribd company logo
1 of 26
Download to read offline
Luis	
  Faria	
  lfaria@keep.pt
KEEP	
  SOLUTIONS	
  www.keep-­‐solu:ons.com
Alan	
  Akbik,	
  Barbara	
  Sierman,	
  Marcel	
  Ras,	
  Miguel	
  Ferreira,	
  José	
  Carlos	
  Ramalho
iPRES	
  2013
Lisbon,	
  September	
  2,	
  2013
Automa0c	
  Preserva0on	
  Watch
Using	
  Informa-on	
  Extrac-on	
  on	
  the	
  Web
Repository
Format obsolescence
Emerging technology
Consumer trends
New standards
Organisation
mission
Bit rot
Resource capability
System availability
Security breach
Economical limitations Social and political factors
Producer trends
Organisation
policies
2
Why do we need monitoring?
Repository
Format obsolescence
Emerging technology
Consumer trends
New standards
Organisation
mission
Bit rot
Resource capability
System availability
Security breach
Economical limitations Social and political factors
Producer trends
Organisation
policies
3
Why do we need monitoring?
Risks
Opportunities
60%
40%
Yes but manual and adhoc
None
Risk Assessment
Survey on:
4
Scout:	
  a	
  preserva-on	
  watch	
  system
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Monitors	
  aspects	
  of	
  the	
  world	
  to	
  detect	
  preserva:on	
  risks	
  and	
  opportuni:es
5
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
6
Information Sources
• Format registries & software catalogues
• Digital repositories & web archives
• Organizational objectives
• Experiments
• Simulation
• Human knowledge
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
7
Currently supported information sources
• PRONOM
• Repository content and events
• Web archive content
• Web archive renderability experiments
• SCAPE Policy model
8
Define triggers
• Notify me when there are tools that can render the
format X.
9
Define triggers
Simple query with templates
10
Receive
notifications
Email
HTTP Push API
There	
  are	
  tools	
  that	
  can	
  render	
  format	
  X.
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Automa-c	
  Watch	
  Limita-ons
11
Machine readable data
• Explicit and formal specified information
• Controlled vocabulary
• Ontology
• All instances use same structure and set of values
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Case	
  study:	
  e-­‐Depot	
  coverage
12
0
100
200
300
400
500
600
40% 50% 60% 70% 80% 90% 100%
% of journal titles
Publishers Titles per publisher
97%
publishers
1-10
titles
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
e-­‐journal	
  coverage	
  ques-ons
13
• Which	
  publisher	
  provides	
  which	
  journal	
  -tles
• Publisher	
  changes:
• Ceases	
  to	
  provide	
  journal
• Transfers	
  journal	
  to	
  other	
  publisher(s)
• Publishers	
  merge
• Journal	
  changes:
• Name	
  changes
• ISSN	
  changes
• Ceased	
  to	
  exist
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Where	
  is	
  this	
  informa-on?
14
“In 1991, two years before the merger with Reed, Elsevier
acquired Pergamon Press in the UK.”
“The Asia-Europe Foundation (ASEF) sold the Asia Europe
Journal and transferred the copyright to its long-time partner
Springer.”
“Acta Chirurgica Iugoslavica is available free of charge as an
Open Access journal on the Internet.”
In the publisher website!
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Where	
  is	
  this	
  informa-on?
14
“In 1991, two years before the merger with Reed, Elsevier
acquired Pergamon Press in the UK.”
“The Asia-Europe Foundation (ASEF) sold the Asia Europe
Journal and transferred the copyright to its long-time partner
Springer.”
“Acta Chirurgica Iugoslavica is available free of charge as an
Open Access journal on the Internet.”
In the publisher website!
Not
machine
readable!
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Informa-on	
  Extrac-on
• Extract structural information from unstructured data
• Pattern-based information extraction
• Some training and supervision may be needed
15
“[X] acquired [Y]”
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Experiment
1. Data acquisition and pre-processing
2. Relation discovery
3. Information extraction
4. Validation of results
16
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
1.	
  Data	
  acquisi-on	
  and	
  pre-­‐processing
• Focused crawler with seed words (12.000 entries)
• Publisher names
• Journal titles
➡500.000 Web pages
• Pre-process with NLP tools
➡18 million sentences
➡8 GB
17
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
2.	
  Rela-on	
  discovery
18
Prominent pattern Rank
[X] journal of [Y] 1
[X] published by [Y] 2
[X] journal on [Y] 3
[X] journal published by [Y] 4
[X] available as [Y] journal 5
PubMed [X] [Y] 9
[X] science proceedings of [Y] 25
[X] subscription available to [Y] 30
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
2.	
  Rela-on	
  discovery
19
Prominent pattern Rank
[X] journal of [Y] 1
[X] published by [Y] 2
[X] journal on [Y] 3
[X] journal published by [Y] 4
[X] available as [Y] journal 5
PubMed [X] [Y] 9
[X] science proceedings of [Y] 25
[X] subscription available to [Y] 30
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
3.	
  Informa-on	
  extrac-on
20
2.000 journal titles
500 journal-publisher attributions
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
4.	
  Valida-on	
  of	
  results
21
4%
10%
86%
Journal titles in eDepot
15%
50%
35%
Title-publisher in the Keepers registry
Should add Existing
False-positives
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
False-­‐posi-ves
• Detecting boundaries of titles and publisher names
• Using abbreviations on titles and publisher names
• Technical problems like encoding
22
“European Journal of Nuclear Medicine and Molecular Imaging”
IAAE - “International Association of Agricultural Economists”
“├ó╦å┼buda University”
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Conclusions
• We need data to support digital preservation
• Explicit and formal specified for automation
• Registries tend to be incomplete and outdated
• Information Extraction Technologies can help
• Still, some supervision may be needed
23
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Send	
  us	
  your	
  use	
  cases!
24
Alan Akbik
alan.akbik@tu-berlin.de
Luis Faria
lfaria@keep.pt
Preservation Watch
What risks to monitor?
Information Extraction
What to extract from the web?
This	
  work	
  was	
  par,ally	
  supported	
  by	
  the	
  SCAPE	
  Project.
The	
  SCAPE	
  project	
  is	
  co-­‐funded	
  by	
  the	
  European	
  Union	
  under	
  FP7	
  ICT-­‐2009.4.1	
  (Grant	
  Agreement	
  number	
  270137).
Thank	
  you,	
  ques-ons?
• Scout - a preservation watch system
• Site: http://openplanets.github.io/scout/
• Demo: http://scout.scape.keep.pt
• SCAPE Planning and Watch suite iPRES poster
• http://bit.ly/scape-pw
• SCAPE
• http://www.scape-project.eu
25

More Related Content

What's hot

Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013MediaMixerCommunity
 
Per Blixt - IPv6 deployment, taking stock and next steps?
Per Blixt - IPv6 deployment, taking stock and next steps?Per Blixt - IPv6 deployment, taking stock and next steps?
Per Blixt - IPv6 deployment, taking stock and next steps?IPv6 Conference
 
Experience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoExperience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoBlue BRIDGE
 
OpenAIRE NOADs
OpenAIRE NOADsOpenAIRE NOADs
OpenAIRE NOADsOpenAIRE
 
1st Technical Meeting - WP8
1st Technical Meeting - WP81st Technical Meeting - WP8
1st Technical Meeting - WP8SLOPE Project
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...Rafael C. Jimenez
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyArchiver
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers
 
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform OpenAIRE
 
SLOPE Final Conference - 3D harvesting planner
SLOPE Final Conference - 3D harvesting plannerSLOPE Final Conference - 3D harvesting planner
SLOPE Final Conference - 3D harvesting plannerSLOPE Project
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
SLOPE Final Conference - intelligent truck
SLOPE Final Conference - intelligent truckSLOPE Final Conference - intelligent truck
SLOPE Final Conference - intelligent truckSLOPE Project
 
Policy Making: A Powerful Tool
Policy Making: A Powerful ToolPolicy Making: A Powerful Tool
Policy Making: A Powerful ToolRIPE NCC
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 

What's hot (20)

Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
 
Per Blixt - IPv6 deployment, taking stock and next steps?
Per Blixt - IPv6 deployment, taking stock and next steps?Per Blixt - IPv6 deployment, taking stock and next steps?
Per Blixt - IPv6 deployment, taking stock and next steps?
 
Experience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale PaganoExperience in managing service portfolio by Pasquale Pagano
Experience in managing service portfolio by Pasquale Pagano
 
Per Blixt
Per BlixtPer Blixt
Per Blixt
 
OpenAIRE NOADs
OpenAIRE NOADsOpenAIRE NOADs
OpenAIRE NOADs
 
1st Technical Meeting - WP8
1st Technical Meeting - WP81st Technical Meeting - WP8
1st Technical Meeting - WP8
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overview
 
The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...The European life-science data infrastructure: Data, Computing and Services ...
The European life-science data infrastructure: Data, Computing and Services ...
 
New toolkit introduced by the energy infrastructure package
New toolkit introduced by the energy infrastructure packageNew toolkit introduced by the energy infrastructure package
New toolkit introduced by the energy infrastructure package
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013
 
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
Webinar on OpenAIRE compatibility for repositories: EPrints repository platform
 
SLOPE Final Conference - 3D harvesting planner
SLOPE Final Conference - 3D harvesting plannerSLOPE Final Conference - 3D harvesting planner
SLOPE Final Conference - 3D harvesting planner
 
Fire at Net Futures2015
Fire at Net Futures2015Fire at Net Futures2015
Fire at Net Futures2015
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
SLOPE Final Conference - intelligent truck
SLOPE Final Conference - intelligent truckSLOPE Final Conference - intelligent truck
SLOPE Final Conference - intelligent truck
 
FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2
 
FIRE Brochure 2014 multimedia eBook -version
FIRE Brochure 2014 multimedia eBook -versionFIRE Brochure 2014 multimedia eBook -version
FIRE Brochure 2014 multimedia eBook -version
 
Policy Making: A Powerful Tool
Policy Making: A Powerful ToolPolicy Making: A Powerful Tool
Policy Making: A Powerful Tool
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 

Viewers also liked

Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
Evaluation of format identification tools
Evaluation of format identification toolsEvaluation of format identification tools
Evaluation of format identification toolsSCAPE Project
 
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012SCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE Project
 

Viewers also liked (11)

Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
Evaluation of format identification tools
Evaluation of format identification toolsEvaluation of format identification tools
Evaluation of format identification tools
 
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
Digital Preservation - The Saga Continues - SCAPE Training event, Guimarães 2012
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation Environments
 

Similar to Automatic Preservation Watch

SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibrarySven Schlarb
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
PaNOSC: EOSC for Photon and Neutron Facilities Users
PaNOSC: EOSC for Photon and Neutron Facilities Users PaNOSC: EOSC for Photon and Neutron Facilities Users
PaNOSC: EOSC for Photon and Neutron Facilities Users EOSC-hub project
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Onlinecneudecker
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspaperscneudecker
 
TAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - ItalyTAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - ItalyTanja Vos
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshellcneudecker
 
TAROT2013 Testing School - Tanja Vos presentation
TAROT2013 Testing School - Tanja Vos presentationTAROT2013 Testing School - Tanja Vos presentation
TAROT2013 Testing School - Tanja Vos presentationHenry Muccini
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionEuropeana Newspapers
 
Value&impact research dataservices_idcc_2017
Value&impact  research dataservices_idcc_2017Value&impact  research dataservices_idcc_2017
Value&impact research dataservices_idcc_2017Neil Beagrie
 
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC
 
Ia4 si caps concertation presentation
Ia4 si caps concertation presentationIa4 si caps concertation presentation
Ia4 si caps concertation presentationCAPS2020
 

Similar to Automatic Preservation Watch (20)

SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
ExPaNDS
ExPaNDSExPaNDS
ExPaNDS
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National Library
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
PaNOSC: EOSC for Photon and Neutron Facilities Users
PaNOSC: EOSC for Photon and Neutron Facilities Users PaNOSC: EOSC for Photon and Neutron Facilities Users
PaNOSC: EOSC for Photon and Neutron Facilities Users
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
TAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - ItalyTAROT summerschool slides 2013 - Italy
TAROT summerschool slides 2013 - Italy
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
 
EurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_NeudeckerEurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_Neudecker
 
TAROT2013 Testing School - Tanja Vos presentation
TAROT2013 Testing School - Tanja Vos presentationTAROT2013 Testing School - Tanja Vos presentation
TAROT2013 Testing School - Tanja Vos presentation
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introduction
 
Value&impact research dataservices_idcc_2017
Value&impact  research dataservices_idcc_2017Value&impact  research dataservices_idcc_2017
Value&impact research dataservices_idcc_2017
 
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
PaNOSC and Research Data Management / Battery2030+ Initiative Workshop / 12 M...
 
The Fertigation bible
The Fertigation bibleThe Fertigation bible
The Fertigation bible
 
ENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilmsENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilms
 
COMPARE: A global platform for the sequence-based rapid identification of pat...
COMPARE: A global platform for the sequence-based rapid identification of pat...COMPARE: A global platform for the sequence-based rapid identification of pat...
COMPARE: A global platform for the sequence-based rapid identification of pat...
 
Ia4 si caps concertation presentation
Ia4 si caps concertation presentationIa4 si caps concertation presentation
Ia4 si caps concertation presentation
 

More from SCAPE Project

Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000SCAPE Project
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation SCAPE Project
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPESCAPE Project
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...SCAPE Project
 

More from SCAPE Project (13)

Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPE
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...
 

Recently uploaded

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Automatic Preservation Watch

  • 1. Luis  Faria  lfaria@keep.pt KEEP  SOLUTIONS  www.keep-­‐solu:ons.com Alan  Akbik,  Barbara  Sierman,  Marcel  Ras,  Miguel  Ferreira,  José  Carlos  Ramalho iPRES  2013 Lisbon,  September  2,  2013 Automa0c  Preserva0on  Watch Using  Informa-on  Extrac-on  on  the  Web
  • 2. Repository Format obsolescence Emerging technology Consumer trends New standards Organisation mission Bit rot Resource capability System availability Security breach Economical limitations Social and political factors Producer trends Organisation policies 2 Why do we need monitoring?
  • 3. Repository Format obsolescence Emerging technology Consumer trends New standards Organisation mission Bit rot Resource capability System availability Security breach Economical limitations Social and political factors Producer trends Organisation policies 3 Why do we need monitoring? Risks Opportunities
  • 4. 60% 40% Yes but manual and adhoc None Risk Assessment Survey on: 4
  • 5. Scout:  a  preserva-on  watch  system This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Monitors  aspects  of  the  world  to  detect  preserva:on  risks  and  opportuni:es 5
  • 6. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 6 Information Sources • Format registries & software catalogues • Digital repositories & web archives • Organizational objectives • Experiments • Simulation • Human knowledge
  • 7. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 7 Currently supported information sources • PRONOM • Repository content and events • Web archive content • Web archive renderability experiments • SCAPE Policy model
  • 8. 8 Define triggers • Notify me when there are tools that can render the format X.
  • 10. 10 Receive notifications Email HTTP Push API There  are  tools  that  can  render  format  X.
  • 11. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Automa-c  Watch  Limita-ons 11 Machine readable data • Explicit and formal specified information • Controlled vocabulary • Ontology • All instances use same structure and set of values
  • 12. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Case  study:  e-­‐Depot  coverage 12 0 100 200 300 400 500 600 40% 50% 60% 70% 80% 90% 100% % of journal titles Publishers Titles per publisher 97% publishers 1-10 titles
  • 13. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). e-­‐journal  coverage  ques-ons 13 • Which  publisher  provides  which  journal  -tles • Publisher  changes: • Ceases  to  provide  journal • Transfers  journal  to  other  publisher(s) • Publishers  merge • Journal  changes: • Name  changes • ISSN  changes • Ceased  to  exist
  • 14. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Where  is  this  informa-on? 14 “In 1991, two years before the merger with Reed, Elsevier acquired Pergamon Press in the UK.” “The Asia-Europe Foundation (ASEF) sold the Asia Europe Journal and transferred the copyright to its long-time partner Springer.” “Acta Chirurgica Iugoslavica is available free of charge as an Open Access journal on the Internet.” In the publisher website!
  • 15. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Where  is  this  informa-on? 14 “In 1991, two years before the merger with Reed, Elsevier acquired Pergamon Press in the UK.” “The Asia-Europe Foundation (ASEF) sold the Asia Europe Journal and transferred the copyright to its long-time partner Springer.” “Acta Chirurgica Iugoslavica is available free of charge as an Open Access journal on the Internet.” In the publisher website! Not machine readable!
  • 16. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Informa-on  Extrac-on • Extract structural information from unstructured data • Pattern-based information extraction • Some training and supervision may be needed 15 “[X] acquired [Y]”
  • 17. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Experiment 1. Data acquisition and pre-processing 2. Relation discovery 3. Information extraction 4. Validation of results 16
  • 18. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 1.  Data  acquisi-on  and  pre-­‐processing • Focused crawler with seed words (12.000 entries) • Publisher names • Journal titles ➡500.000 Web pages • Pre-process with NLP tools ➡18 million sentences ➡8 GB 17
  • 19. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 2.  Rela-on  discovery 18 Prominent pattern Rank [X] journal of [Y] 1 [X] published by [Y] 2 [X] journal on [Y] 3 [X] journal published by [Y] 4 [X] available as [Y] journal 5 PubMed [X] [Y] 9 [X] science proceedings of [Y] 25 [X] subscription available to [Y] 30
  • 20. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 2.  Rela-on  discovery 19 Prominent pattern Rank [X] journal of [Y] 1 [X] published by [Y] 2 [X] journal on [Y] 3 [X] journal published by [Y] 4 [X] available as [Y] journal 5 PubMed [X] [Y] 9 [X] science proceedings of [Y] 25 [X] subscription available to [Y] 30
  • 21. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 3.  Informa-on  extrac-on 20 2.000 journal titles 500 journal-publisher attributions
  • 22. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 4.  Valida-on  of  results 21 4% 10% 86% Journal titles in eDepot 15% 50% 35% Title-publisher in the Keepers registry Should add Existing False-positives
  • 23. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). False-­‐posi-ves • Detecting boundaries of titles and publisher names • Using abbreviations on titles and publisher names • Technical problems like encoding 22 “European Journal of Nuclear Medicine and Molecular Imaging” IAAE - “International Association of Agricultural Economists” “├ó╦å┼buda University”
  • 24. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Conclusions • We need data to support digital preservation • Explicit and formal specified for automation • Registries tend to be incomplete and outdated • Information Extraction Technologies can help • Still, some supervision may be needed 23
  • 25. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Send  us  your  use  cases! 24 Alan Akbik alan.akbik@tu-berlin.de Luis Faria lfaria@keep.pt Preservation Watch What risks to monitor? Information Extraction What to extract from the web?
  • 26. This  work  was  par,ally  supported  by  the  SCAPE  Project. The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Thank  you,  ques-ons? • Scout - a preservation watch system • Site: http://openplanets.github.io/scout/ • Demo: http://scout.scape.keep.pt • SCAPE Planning and Watch suite iPRES poster • http://bit.ly/scape-pw • SCAPE • http://www.scape-project.eu 25