SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Avoiding Zombies in
Archival Replay Using
ServiceWorker
Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University, Norfolk, VA, 23529
@ibnesayeed
@WebSciDL
Supported in part by NSF III 1526700
1
WADL 2017, June 22-23, 2017, Toronto, Ontario, Canada
Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2017
2
● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
?
Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2012
3
● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Sawood Alam <@ibnesayeed>
XenLand @ Alpha Centauri
4
Sawood Alam <@ibnesayeed>
Zombies in Archive
5
?
Sawood Alam <@ibnesayeed>
Zombies in Archive
6
<img src="http://xenland.alpha/images/map.png">
// Is rewritten on replay to become:
<img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png">
// URLs constructed by JavaScript are harder to rewrite on replay, e.g.:
var base = 'http://xenland.alpha';
var imgdir = '/images/';
var img = document.createElement('img');
img.src = base + imgdir + 'ruler.png';
document.getElementById('ruler').appendChild(img);
//=>> http://xenland.alpha/images/ruler.png
Sawood Alam <@ibnesayeed>
Replay URL Resolution & Rewriting
7
Reference type Example Resolution after relocation
Relative path images/logo.png Potentially correct
Absolute path /public/images/logo.png Potentially incorrect
Absolute URL http://example.com/public/images/logo.png Potentially live leakage
http://example.com/public/index.html
...
<img src="/public/images/logo.png">
...
http://archive.example.org/<datetime>/http://example.com/public/index.html
...
<img src="/<datetime>/http://example.com/public/images/logo.png">
...
Sawood Alam <@ibnesayeed>
Avoiding Zombies
● Ahead-of-time rendering and JS execution
○ http://archive.is/
● Archival replay proxy
○ https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage
● Browser extension
○ MementoFox (deprecated)
● JS override
○ wombat.js in PyWB
● ServiceWorker
8
Sawood Alam <@ibnesayeed>
● New web API (still a working draft)
● A standalone JavaScript file
● Persists in the browser independent of the window
● Acts as a proxy
● Installed by a web page under its domain at a specific path (called scope)
● Intercepts all requests in scope
○ Resources under the scope path (at any depth)
○ Secondary resource requests originated from any resource under scope
● Allows modification in request and response
● Primarily used in web applications for offline access and notification support
● Requires HTTPS
● Growing browser support (73.61% as of June 8, 2017)
ServiceWorker
9
● http://caniuse.com/#feat=serviceworkers
Sawood Alam <@ibnesayeed>
reconstructive.js
10
● https://github.com/oduwsdl/reconstructive
● A ServiceWorker script written for archival replay
● Plug-in for web archives or Memento aggregators
● Intercepts all network requests originated from a memento
● Reroutes requests to an archive (prevents live leakage & incorrect references)
● Optionally rewrites the content to add banner & to fix hyperlinks
Sawood Alam <@ibnesayeed>
Zombies, No More!
11
● https://github.com/oduwsdl/ipwb
Sawood Alam <@ibnesayeed>
Rewriting Mementos is Expensive
12
Original capture (without any rewriting)
In our experiment over 500 home pages we observed:
● One-fifth mean data overhead
● One-third mean time overhead
15% more data in twice the time
Sawood Alam <@ibnesayeed>
Archival Capture Replay Test Suite (ACRTS)
13
reconstructive.js
● https://ibnesayeed.github.io/acrts/
Sawood Alam <@ibnesayeed>
Reconstruction Winners: PyWB & reconstructive.js
A. OpenWayback
B. PyWB
C. Memento
Reconstruct
D. Memento for
Chrome
E. reconstructive.js
14
Sawood Alam <@ibnesayeed>
Future Work
● Use “Prefer” header for original content (when archives support it)
● Add a customizable archival banner
● Add click handler for lazy rewriting of hyperlinks
● Handle archived ServiceWorkers
● Write a 404-combat ServiceWorker script for webmasters
15
● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html
Sawood Alam <@ibnesayeed>
● reconstructive.js => no zombies!
● Rerouting instead of rewriting (lazy rewriting)
● Mean overhead reduction
○ one-fifth data
○ one-third time
● 73.61% (and growing) browser support for ServiceWorker
○ http://caniuse.com/#feat=serviceworkers
● reconstructive.js
○ https://github.com/oduwsdl/reconstructive
● Archival Capture Replay Test Suite
○ https://ibnesayeed.github.io/acrts/
Conclusions
16

Contenu connexe

Tendances

Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everythinglibrarywebchic
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Answers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked DataAnswers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked DataOlaf Hartig
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Andrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAndrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAcquia
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
IBM Connections REST-API Waltz
IBM Connections REST-API WaltzIBM Connections REST-API Waltz
IBM Connections REST-API WaltzHenning Schmidt
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)Anna Perricci
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
 
IBM Connections REST API Klompendans
IBM Connections REST API KlompendansIBM Connections REST API Klompendans
IBM Connections REST API KlompendansHenning Schmidt
 
Reference Rot and Link Decoration
Reference Rot and Link DecorationReference Rot and Link Decoration
Reference Rot and Link DecorationMartin Klein
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...EDINA, University of Edinburgh
 

Tendances (20)

Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Answers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked DataAnswers to usual issues in getting started with consuming Linked Data
Answers to usual issues in getting started with consuming Linked Data
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Andrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State SenateAndrew Hoppin, CIO, NY State Senate
Andrew Hoppin, CIO, NY State Senate
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
IBM Connections REST-API Waltz
IBM Connections REST-API WaltzIBM Connections REST-API Waltz
IBM Connections REST-API Waltz
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
Archiving Occupy (presentation for NYC Digital Asset Managers Meetup)
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
IBM Connections REST API Klompendans
IBM Connections REST API KlompendansIBM Connections REST API Klompendans
IBM Connections REST API Klompendans
 
Reference Rot and Link Decoration
Reference Rot and Link DecorationReference Rot and Link Decoration
Reference Rot and Link Decoration
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...
 

Similaire à Avoiding Zombies in Archival Replay Using ServiceWorker

Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerSawood Alam
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
 
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web ArchivesOptimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web ArchivesKritika Garg
 
JohnNicoResume
JohnNicoResumeJohnNicoResume
JohnNicoResumeJohn Nico
 
Optimizing Web Performance for Mobile Users
Optimizing Web Performance for Mobile UsersOptimizing Web Performance for Mobile Users
Optimizing Web Performance for Mobile UsersMuhammad Samu
 
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingInterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingSawood Alam
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
Notes on SF W3Conf
Notes on SF W3ConfNotes on SF W3Conf
Notes on SF W3ConfEdy Dawson
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWPSFO Meetup Group
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Doug Gapinski
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Dave Olsen
 
20 tips for website performance
20 tips for website performance20 tips for website performance
20 tips for website performanceAndrew Siemer
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Producing a mobile presence. Timeline: Yesterday...
Producing a mobile presence. Timeline: Yesterday...Producing a mobile presence. Timeline: Yesterday...
Producing a mobile presence. Timeline: Yesterday...Nick DeNardis
 
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesCache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesFelix Gessert
 
How to Build a Scalable Platform for Today's Publishers
How to Build a Scalable Platform for Today's PublishersHow to Build a Scalable Platform for Today's Publishers
How to Build a Scalable Platform for Today's PublishersDick Olsson
 
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013Abraham Aranguren
 
Offline first development - Glasgow PHP - January 2016
Offline first development - Glasgow PHP - January 2016Offline first development - Glasgow PHP - January 2016
Offline first development - Glasgow PHP - January 2016Glynn Bird
 
muCon 2016: "Seven (More) Deadly Sins of Microservices"
muCon 2016: "Seven (More) Deadly Sins of Microservices"muCon 2016: "Seven (More) Deadly Sins of Microservices"
muCon 2016: "Seven (More) Deadly Sins of Microservices"Daniel Bryant
 

Similaire à Avoiding Zombies in Archival Replay Using ServiceWorker (20)

Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web ArchivesOptimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives
 
JohnNicoResume
JohnNicoResumeJohnNicoResume
JohnNicoResume
 
Optimizing Web Performance for Mobile Users
Optimizing Web Performance for Mobile UsersOptimizing Web Performance for Mobile Users
Optimizing Web Performance for Mobile Users
 
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web ArchivingInterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
InterPlanetary Wayback: The Next Step Towards Decentralized Web Archiving
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Notes on SF W3Conf
Notes on SF W3ConfNotes on SF W3Conf
Notes on SF W3Conf
 
What is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress HostingWhat is Nginx and Why You Should to Use it with Wordpress Hosting
What is Nginx and Why You Should to Use it with Wordpress Hosting
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
 
20 tips for website performance
20 tips for website performance20 tips for website performance
20 tips for website performance
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Producing a mobile presence. Timeline: Yesterday...
Producing a mobile presence. Timeline: Yesterday...Producing a mobile presence. Timeline: Yesterday...
Producing a mobile presence. Timeline: Yesterday...
 
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load TimesCache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
Cache Sketches: Using Bloom Filters and Web Caching Against Slow Load Times
 
How to Build a Scalable Platform for Today's Publishers
How to Build a Scalable Platform for Today's PublishersHow to Build a Scalable Platform for Today's Publishers
How to Build a Scalable Platform for Today's Publishers
 
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
OWASP OWTF - Summer Storm - OWASP AppSec EU 2013
 
Offline first development - Glasgow PHP - January 2016
Offline first development - Glasgow PHP - January 2016Offline first development - Glasgow PHP - January 2016
Offline first development - Glasgow PHP - January 2016
 
muCon 2016: "Seven (More) Deadly Sins of Microservices"
muCon 2016: "Seven (More) Deadly Sins of Microservices"muCon 2016: "Seven (More) Deadly Sins of Microservices"
muCon 2016: "Seven (More) Deadly Sins of Microservices"
 

Plus de Sawood Alam

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesSawood Alam
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsSawood Alam
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineSawood Alam
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingSawood Alam
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSawood Alam
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
 
Supporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSupporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSawood Alam
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
 
Impact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesImpact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesSawood Alam
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File FormatSawood Alam
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoSawood Alam
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationSawood Alam
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupSawood Alam
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesSawood Alam
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchSawood Alam
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionSawood Alam
 

Plus de Sawood Alam (20)

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web Pages
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection Insights
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Supporting Web Archiving via Web Packaging
Supporting Web Archiving via Web PackagingSupporting Web Archiving via Web Packaging
Supporting Web Archiving via Web Packaging
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination Framework
 
Impact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web ArchivesImpact of HTTP Cookie Violations in Web Archives
Impact of HTTP Cookie Violations in Web Archives
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification Framework
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in Go
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 

Dernier

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 

Dernier (20)

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 

Avoiding Zombies in Archival Replay Using ServiceWorker

  • 1. Avoiding Zombies in Archival Replay Using ServiceWorker Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson Web Science and Digital Libraries Research Group Old Dominion University, Norfolk, VA, 23529 @ibnesayeed @WebSciDL Supported in part by NSF III 1526700 1 WADL 2017, June 22-23, 2017, Toronto, Ontario, Canada
  • 2. Sawood Alam <@ibnesayeed> 2008 Memento Seen in 2017 2 ● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html ?
  • 3. Sawood Alam <@ibnesayeed> 2008 Memento Seen in 2012 3 ● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
  • 6. Sawood Alam <@ibnesayeed> Zombies in Archive 6 <img src="http://xenland.alpha/images/map.png"> // Is rewritten on replay to become: <img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png"> // URLs constructed by JavaScript are harder to rewrite on replay, e.g.: var base = 'http://xenland.alpha'; var imgdir = '/images/'; var img = document.createElement('img'); img.src = base + imgdir + 'ruler.png'; document.getElementById('ruler').appendChild(img); //=>> http://xenland.alpha/images/ruler.png
  • 7. Sawood Alam <@ibnesayeed> Replay URL Resolution & Rewriting 7 Reference type Example Resolution after relocation Relative path images/logo.png Potentially correct Absolute path /public/images/logo.png Potentially incorrect Absolute URL http://example.com/public/images/logo.png Potentially live leakage http://example.com/public/index.html ... <img src="/public/images/logo.png"> ... http://archive.example.org/<datetime>/http://example.com/public/index.html ... <img src="/<datetime>/http://example.com/public/images/logo.png"> ...
  • 8. Sawood Alam <@ibnesayeed> Avoiding Zombies ● Ahead-of-time rendering and JS execution ○ http://archive.is/ ● Archival replay proxy ○ https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage ● Browser extension ○ MementoFox (deprecated) ● JS override ○ wombat.js in PyWB ● ServiceWorker 8
  • 9. Sawood Alam <@ibnesayeed> ● New web API (still a working draft) ● A standalone JavaScript file ● Persists in the browser independent of the window ● Acts as a proxy ● Installed by a web page under its domain at a specific path (called scope) ● Intercepts all requests in scope ○ Resources under the scope path (at any depth) ○ Secondary resource requests originated from any resource under scope ● Allows modification in request and response ● Primarily used in web applications for offline access and notification support ● Requires HTTPS ● Growing browser support (73.61% as of June 8, 2017) ServiceWorker 9 ● http://caniuse.com/#feat=serviceworkers
  • 10. Sawood Alam <@ibnesayeed> reconstructive.js 10 ● https://github.com/oduwsdl/reconstructive ● A ServiceWorker script written for archival replay ● Plug-in for web archives or Memento aggregators ● Intercepts all network requests originated from a memento ● Reroutes requests to an archive (prevents live leakage & incorrect references) ● Optionally rewrites the content to add banner & to fix hyperlinks
  • 11. Sawood Alam <@ibnesayeed> Zombies, No More! 11 ● https://github.com/oduwsdl/ipwb
  • 12. Sawood Alam <@ibnesayeed> Rewriting Mementos is Expensive 12 Original capture (without any rewriting) In our experiment over 500 home pages we observed: ● One-fifth mean data overhead ● One-third mean time overhead 15% more data in twice the time
  • 13. Sawood Alam <@ibnesayeed> Archival Capture Replay Test Suite (ACRTS) 13 reconstructive.js ● https://ibnesayeed.github.io/acrts/
  • 14. Sawood Alam <@ibnesayeed> Reconstruction Winners: PyWB & reconstructive.js A. OpenWayback B. PyWB C. Memento Reconstruct D. Memento for Chrome E. reconstructive.js 14
  • 15. Sawood Alam <@ibnesayeed> Future Work ● Use “Prefer” header for original content (when archives support it) ● Add a customizable archival banner ● Add click handler for lazy rewriting of hyperlinks ● Handle archived ServiceWorkers ● Write a 404-combat ServiceWorker script for webmasters 15 ● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html
  • 16. Sawood Alam <@ibnesayeed> ● reconstructive.js => no zombies! ● Rerouting instead of rewriting (lazy rewriting) ● Mean overhead reduction ○ one-fifth data ○ one-third time ● 73.61% (and growing) browser support for ServiceWorker ○ http://caniuse.com/#feat=serviceworkers ● reconstructive.js ○ https://github.com/oduwsdl/reconstructive ● Archival Capture Replay Test Suite ○ https://ibnesayeed.github.io/acrts/ Conclusions 16