SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Archive What I See Now
Personal Web Archiving with WARCs
Michele C. Weigle, Michael L. Nelson, Mat Kelly, and John Berlin
Web Science and Digital Libraries (WS-DL) Research Group
Old Dominion University
ws-dl.cs.odu.edu • @WebSciDL
http://bit.ly/iipcWAC2017
HD-51670-13 • HK-50181-14
@machawk1
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Web Archiving Tools for Web Users
Standard Web archiving tools
are difficult for non IT experts.
“Save Page As” is not suitable
for archiving purposes.
Pages are behind authentication.
Pages change quickly, but
current state needs archiving.
ARCHIVE
WHAT I
SEE
NOW
HD-51670-13 • HK-50181-14
http://bit.ly/iipcWAC2017
Why?
● Allow non-technical users to locally create+replay
own archives
● Preserve the previously unpreserved
more archives → more better
http://bit.ly/iipcWAC2017IIPC Web Archiving Conference 2017
June 15, 2017
London, UK @machawk1
CREATION
+
ACCESSof personal and private web archives
http://bit.ly/iipcWAC2017IIPC Web Archiving Conference 2017
June 15, 2017
London, UK @machawk1
Goals: Advance Development of 3 Tools
WARCreate
Create a WARC from what you see in your browser
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Web Archiving Integration Layer (WAIL)
Replay the WARC using software of your desktop
Your captures never leaves your machine
Mink
See how your captures temporally integrate with institutions’
Submit new URIs to Web archives (was to-WAIL in scope?)
http://bit.ly/iipcWAC2017
@machawk1
WARCreate
● Google Chrome browser extension
● Save WARC files from your browser
● No credentials pass through 3rd
party
● Heavily leverages Chrome webRequest API
● Built in ‘12, APIs and libraries have evolved!
WARCreate
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
● Three New Modes for Browser-Based Preservation
○ Record Mode - retain buffer as you browse
○ Countdown Mode - preserve reloading page on an interval
○ Event Mode - preserve page when it’s automatically reloaded
● Save to local Web archive (e.g., WAIL)
WARCreate - Recent Advancements
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
Web Archiving Integration Layer (WAIL)
Web Archiving Integration Layer (WAIL)
● Stand-alone desktop application
● Collection-based Web Archiving
● Includes Heritrix for crawling, OpenWayback for Replay
● Python scripts compiled to OS-native binaries (.app, .exe)
● What to do with WARCs?
● See: How WAIL came about, "Lipstick or Ham"
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
WAIL - Recent Advancements
● New User Interface
● Ported from Python to Electron
○ Now using Web technologies to archive the Web
● Single archive to collection-based archiving
● OpenWayback to pywb
● Twitter integration
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
WAIL-Electron Feature Walk-through
WAIL - New User Interface
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Original one-click interface New collection-based interface
http://bit.ly/iipcWAC2017
@machawk1
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
Mink
● Google Chrome browser extension
● Indicates archival capture count as you browse
● Quickly submit URI to multiple archives from UI
● From Mink(owski Space)
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
Mink - Recent Advancements
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
● Enhance interface
○ Add number of archived pages to icon at bottom of page
○ Allow users to set preferences on how to view large set of mementos
● Communication with user-specified (or local) archive in
additional to aggregated institutional archives’ results
http://bit.ly/iipcWAC2017
@machawk1
Mink - Previous Interface
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
➢ Interface affected by page CSS
➢ Obtrusive on the viewport by default
➢ Haphazard, inconsistent animations
http://bit.ly/iipcWAC2017
@machawk1
Mink - User Interface Revamp
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
Mink - User Interface Revamp
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
➢ Interface-on-demand
➢ Shadow DOM, no CSS intrusion
➢ More consistent, intuitive Miller
columns for many captures
http://bit.ly/iipcWAC2017
@machawk1
Mink - User Interface Revamp
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
Mink - Communication with Local Archives
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Any -compatible archive
Aggregated
TimeMap
http://bit.ly/iipcWAC2017
@machawk1
Mink usage GIF, also available at:
https://youtu.be/bGjxofpTgv4
http://bit.ly/iipcWAC2017
Tools’ Integration
HD-51670-13 • HK-50181-14
http://bit.ly/iipcWAC2017
@machawk1
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Tools’ Integration: WARCreate→WAIL
● Save WARC directly to local archive (by reference [easier]
○ By-value integration feasibility being investigated a la WASAPI
● Automatically indexed and replayable
HD-51670-13 • HK-50181-14
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017
@machawk1
Tools’ Integration: Mink→WAIL
any -compatible archive
http://bit.ly/iipcWAC2017
@machawk1
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Some Future Work
● Decouple Mink from external Memento aggregator
○ Client-side customizable set of archives instead
● WARC replay using browser extensions/apps
● Further integration with other archiving tools in WAIL
○ Re-add Memgator Memento aggregator (removed from Electron version)
● Firefox version of tools
○ XUL→WebExtensions
○ Decouple from Chrome APIs
● Integration with InterPlanetary Wayback (speaking about later today)
HD-51670-13 • HK-50181-14
http://bit.ly/iipcWAC2017
@machawk1
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Acknowledgements
● NEH Grant #s HD-51670-13 • HK-50181-14
● Dr. Liza Potts and WIDE Research Center at Michigan
State University
● ODU SEES Travel Grant
HD-51670-13 • HK-50181-14
http://bit.ly/iipcWAC2017
@machawk1
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
Archive What I See Now
Personal Web Archiving with WARCs
Michele C. Weigle, Michael L. Nelson, Mat Kelly, and John Berlin
Web Science and Digital Libraries (WS-DL) Research Group
Old Dominion University
ws-dl.cs.odu.edu
HD-51670-13 • HK-50181-14
@machawk1
IIPC Web Archiving Conference 2017
June 15, 2017
London, UK
http://bit.ly/iipcWAC2017

Contenu connexe

Tendances

JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
RDM#2- The Distributed Web
RDM#2- The Distributed WebRDM#2- The Distributed Web
RDM#2- The Distributed WebDavid Dias
 
Node.js Interactive
Node.js InteractiveNode.js Interactive
Node.js InteractiveDavid Dias
 
Intro to HTTP and Node.js
Intro to HTTP and Node.jsIntro to HTTP and Node.js
Intro to HTTP and Node.jsJean-Luc David
 
WebDAV - April 15 2008
WebDAV - April 15 2008WebDAV - April 15 2008
WebDAV - April 15 2008sullis
 
Redis Overview
Redis OverviewRedis Overview
Redis OverviewHoang Long
 
Steam Learn: An introduction to Redis
Steam Learn: An introduction to RedisSteam Learn: An introduction to Redis
Steam Learn: An introduction to Redisinovia
 
Connect your Javascript web app to ownCloud over the WebDAV interface
Connect your Javascript web app to ownCloud over the WebDAV interface Connect your Javascript web app to ownCloud over the WebDAV interface
Connect your Javascript web app to ownCloud over the WebDAV interface Ilian Sapundshiev
 
Leading a Community-Driven Open Source Project
Leading a Community-Driven Open Source ProjectLeading a Community-Driven Open Source Project
Leading a Community-Driven Open Source ProjectVincent Massol
 
Node.JS and WebSockets with Faye
Node.JS and WebSockets with FayeNode.JS and WebSockets with Faye
Node.JS and WebSockets with FayeMatjaž Lipuš
 
Easy Data for PhoneGap apps with PouchDB
Easy Data for PhoneGap apps with PouchDBEasy Data for PhoneGap apps with PouchDB
Easy Data for PhoneGap apps with PouchDBHolly Schinsky
 
Midgard2: Content repository for desktop and the web
Midgard2: Content repository for desktop and the webMidgard2: Content repository for desktop and the web
Midgard2: Content repository for desktop and the webHenri Bergius
 

Tendances (14)

JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
RDM#2- The Distributed Web
RDM#2- The Distributed WebRDM#2- The Distributed Web
RDM#2- The Distributed Web
 
Redis
RedisRedis
Redis
 
Node.js Interactive
Node.js InteractiveNode.js Interactive
Node.js Interactive
 
Offline-First Apps with PouchDB
Offline-First Apps with PouchDB Offline-First Apps with PouchDB
Offline-First Apps with PouchDB
 
Intro to HTTP and Node.js
Intro to HTTP and Node.jsIntro to HTTP and Node.js
Intro to HTTP and Node.js
 
WebDAV - April 15 2008
WebDAV - April 15 2008WebDAV - April 15 2008
WebDAV - April 15 2008
 
Redis Overview
Redis OverviewRedis Overview
Redis Overview
 
Steam Learn: An introduction to Redis
Steam Learn: An introduction to RedisSteam Learn: An introduction to Redis
Steam Learn: An introduction to Redis
 
Connect your Javascript web app to ownCloud over the WebDAV interface
Connect your Javascript web app to ownCloud over the WebDAV interface Connect your Javascript web app to ownCloud over the WebDAV interface
Connect your Javascript web app to ownCloud over the WebDAV interface
 
Leading a Community-Driven Open Source Project
Leading a Community-Driven Open Source ProjectLeading a Community-Driven Open Source Project
Leading a Community-Driven Open Source Project
 
Node.JS and WebSockets with Faye
Node.JS and WebSockets with FayeNode.JS and WebSockets with Faye
Node.JS and WebSockets with Faye
 
Easy Data for PhoneGap apps with PouchDB
Easy Data for PhoneGap apps with PouchDBEasy Data for PhoneGap apps with PouchDB
Easy Data for PhoneGap apps with PouchDB
 
Midgard2: Content repository for desktop and the web
Midgard2: Content repository for desktop and the webMidgard2: Content repository for desktop and the web
Midgard2: Content repository for desktop and the web
 

En vedette

Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...
A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...
A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...machawk1
 
MS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-It
MS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-ItMS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-It
MS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-ItKalpesh Padia
 
Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012Kalpesh Padia
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationSawood Alam
 

En vedette (6)

Local Memory Project
Local Memory ProjectLocal Memory Project
Local Memory Project
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...
A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...
A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...
 
MS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-It
MS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-ItMS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-It
MS Thesis Defense, Aug 2012 - Visualizing Digital Collections at Archive-It
 
Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
 

Similaire à Archive What I See Now: Personal Web Archiving with WARCs

Continuous delivery with jenkins pipelines (@devfest Vienna)
Continuous delivery with jenkins pipelines (@devfest Vienna)Continuous delivery with jenkins pipelines (@devfest Vienna)
Continuous delivery with jenkins pipelines (@devfest Vienna)Roman Pickl
 
Whats New in IBM Integration Bus Interconnect 2017
Whats New in IBM Integration Bus Interconnect 2017Whats New in IBM Integration Bus Interconnect 2017
Whats New in IBM Integration Bus Interconnect 2017bthomps1979
 
Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)EOSC-hub project
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemorySamantha Norling
 
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems IntegrationJenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems IntegrationOleg Nenashev
 
Creating Data Driven HTML5 Applications
Creating Data Driven HTML5 ApplicationsCreating Data Driven HTML5 Applications
Creating Data Driven HTML5 ApplicationsGil Fink
 
Icinga 2010 at CeBIT
Icinga 2010 at CeBITIcinga 2010 at CeBIT
Icinga 2010 at CeBITIcinga
 
Intro to Exhibit Workshop
Intro to Exhibit WorkshopIntro to Exhibit Workshop
Intro to Exhibit WorkshopShawn Day
 
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoJuly OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoHoward Greenberg
 
SharePoint Framework get started and best practices
SharePoint Framework get started and best practicesSharePoint Framework get started and best practices
SharePoint Framework get started and best practicesGiuliano De Luca
 
The Pink road – Dorothy’s journey through an all pink wonderland
The Pink road – Dorothy’s journey through an all pink wonderlandThe Pink road – Dorothy’s journey through an all pink wonderland
The Pink road – Dorothy’s journey through an all pink wonderlandLetsConnect
 
Urbanesia - Development History
Urbanesia - Development HistoryUrbanesia - Development History
Urbanesia - Development HistoryBatista Harahap
 
Project Pink Note – New Note Editor Based on IBM Docs Technology
Project Pink Note – New Note Editor Based on IBM Docs TechnologyProject Pink Note – New Note Editor Based on IBM Docs Technology
Project Pink Note – New Note Editor Based on IBM Docs TechnologyLetsConnect
 
Social connections14: Super charge your API’s with Reactive streams
Social connections14: Super charge your API’s with Reactive streamsSocial connections14: Super charge your API’s with Reactive streams
Social connections14: Super charge your API’s with Reactive streamsFrank van der Linden
 
Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8
Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8
Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8Angela Byron
 
Building cognitive apps with Watson Work Services
Building cognitive apps with Watson Work ServicesBuilding cognitive apps with Watson Work Services
Building cognitive apps with Watson Work ServicesLetsConnect
 
WordPress performance tuning
WordPress performance tuningWordPress performance tuning
WordPress performance tuningVladimír Smitka
 
Continuous delivery with jenkins pipelines @ devdays
Continuous delivery with jenkins pipelines  @ devdaysContinuous delivery with jenkins pipelines  @ devdays
Continuous delivery with jenkins pipelines @ devdaysRoman Pickl
 
Delivering High Performance Websites with NGINX
Delivering High Performance Websites with NGINXDelivering High Performance Websites with NGINX
Delivering High Performance Websites with NGINXNGINX, Inc.
 
MySQL at Wikipedia: How we do relational data at the Wikimedia Foundation
MySQL at Wikipedia: How we do relational data at the Wikimedia FoundationMySQL at Wikipedia: How we do relational data at the Wikimedia Foundation
MySQL at Wikipedia: How we do relational data at the Wikimedia FoundationJaime Crespo
 

Similaire à Archive What I See Now: Personal Web Archiving with WARCs (20)

Continuous delivery with jenkins pipelines (@devfest Vienna)
Continuous delivery with jenkins pipelines (@devfest Vienna)Continuous delivery with jenkins pipelines (@devfest Vienna)
Continuous delivery with jenkins pipelines (@devfest Vienna)
 
Whats New in IBM Integration Bus Interconnect 2017
Whats New in IBM Integration Bus Interconnect 2017Whats New in IBM Integration Bus Interconnect 2017
Whats New in IBM Integration Bus Interconnect 2017
 
Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)Updates from Hungary (Jozsef Kovacs)
Updates from Hungary (Jozsef Kovacs)
 
Archiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional MemoryArchiving Web-Based #musetech for Institutional Memory
Archiving Web-Based #musetech for Institutional Memory
 
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems IntegrationJenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
 
Creating Data Driven HTML5 Applications
Creating Data Driven HTML5 ApplicationsCreating Data Driven HTML5 Applications
Creating Data Driven HTML5 Applications
 
Icinga 2010 at CeBIT
Icinga 2010 at CeBITIcinga 2010 at CeBIT
Icinga 2010 at CeBIT
 
Intro to Exhibit Workshop
Intro to Exhibit WorkshopIntro to Exhibit Workshop
Intro to Exhibit Workshop
 
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoJuly OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
 
SharePoint Framework get started and best practices
SharePoint Framework get started and best practicesSharePoint Framework get started and best practices
SharePoint Framework get started and best practices
 
The Pink road – Dorothy’s journey through an all pink wonderland
The Pink road – Dorothy’s journey through an all pink wonderlandThe Pink road – Dorothy’s journey through an all pink wonderland
The Pink road – Dorothy’s journey through an all pink wonderland
 
Urbanesia - Development History
Urbanesia - Development HistoryUrbanesia - Development History
Urbanesia - Development History
 
Project Pink Note – New Note Editor Based on IBM Docs Technology
Project Pink Note – New Note Editor Based on IBM Docs TechnologyProject Pink Note – New Note Editor Based on IBM Docs Technology
Project Pink Note – New Note Editor Based on IBM Docs Technology
 
Social connections14: Super charge your API’s with Reactive streams
Social connections14: Super charge your API’s with Reactive streamsSocial connections14: Super charge your API’s with Reactive streams
Social connections14: Super charge your API’s with Reactive streams
 
Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8
Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8
Drupal 9 and Backwards Compatibility: Why now is the time to upgrade to Drupal 8
 
Building cognitive apps with Watson Work Services
Building cognitive apps with Watson Work ServicesBuilding cognitive apps with Watson Work Services
Building cognitive apps with Watson Work Services
 
WordPress performance tuning
WordPress performance tuningWordPress performance tuning
WordPress performance tuning
 
Continuous delivery with jenkins pipelines @ devdays
Continuous delivery with jenkins pipelines  @ devdaysContinuous delivery with jenkins pipelines  @ devdays
Continuous delivery with jenkins pipelines @ devdays
 
Delivering High Performance Websites with NGINX
Delivering High Performance Websites with NGINXDelivering High Performance Websites with NGINX
Delivering High Performance Websites with NGINX
 
MySQL at Wikipedia: How we do relational data at the Wikimedia Foundation
MySQL at Wikipedia: How we do relational data at the Wikimedia FoundationMySQL at Wikipedia: How we do relational data at the Wikimedia Foundation
MySQL at Wikipedia: How we do relational data at the Wikimedia Foundation
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Archive What I See Now: Personal Web Archiving with WARCs

  • 1. Archive What I See Now Personal Web Archiving with WARCs Michele C. Weigle, Michael L. Nelson, Mat Kelly, and John Berlin Web Science and Digital Libraries (WS-DL) Research Group Old Dominion University ws-dl.cs.odu.edu • @WebSciDL http://bit.ly/iipcWAC2017 HD-51670-13 • HK-50181-14 @machawk1 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK
  • 2. Web Archiving Tools for Web Users Standard Web archiving tools are difficult for non IT experts. “Save Page As” is not suitable for archiving purposes. Pages are behind authentication. Pages change quickly, but current state needs archiving. ARCHIVE WHAT I SEE NOW HD-51670-13 • HK-50181-14 http://bit.ly/iipcWAC2017
  • 3. Why? ● Allow non-technical users to locally create+replay own archives ● Preserve the previously unpreserved more archives → more better http://bit.ly/iipcWAC2017IIPC Web Archiving Conference 2017 June 15, 2017 London, UK @machawk1
  • 4. CREATION + ACCESSof personal and private web archives http://bit.ly/iipcWAC2017IIPC Web Archiving Conference 2017 June 15, 2017 London, UK @machawk1
  • 5. Goals: Advance Development of 3 Tools WARCreate Create a WARC from what you see in your browser IIPC Web Archiving Conference 2017 June 15, 2017 London, UK Web Archiving Integration Layer (WAIL) Replay the WARC using software of your desktop Your captures never leaves your machine Mink See how your captures temporally integrate with institutions’ Submit new URIs to Web archives (was to-WAIL in scope?) http://bit.ly/iipcWAC2017 @machawk1
  • 7. ● Google Chrome browser extension ● Save WARC files from your browser ● No credentials pass through 3rd party ● Heavily leverages Chrome webRequest API ● Built in ‘12, APIs and libraries have evolved! WARCreate HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 8. ● Three New Modes for Browser-Based Preservation ○ Record Mode - retain buffer as you browse ○ Countdown Mode - preserve reloading page on an interval ○ Event Mode - preserve page when it’s automatically reloaded ● Save to local Web archive (e.g., WAIL) WARCreate - Recent Advancements HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 9.
  • 11. Web Archiving Integration Layer (WAIL) ● Stand-alone desktop application ● Collection-based Web Archiving ● Includes Heritrix for crawling, OpenWayback for Replay ● Python scripts compiled to OS-native binaries (.app, .exe) ● What to do with WARCs? ● See: How WAIL came about, "Lipstick or Ham" HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK
  • 12. WAIL - Recent Advancements ● New User Interface ● Ported from Python to Electron ○ Now using Web technologies to archive the Web ● Single archive to collection-based archiving ● OpenWayback to pywb ● Twitter integration HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1 WAIL-Electron Feature Walk-through
  • 13. WAIL - New User Interface HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK Original one-click interface New collection-based interface http://bit.ly/iipcWAC2017 @machawk1
  • 14.
  • 15.
  • 16. HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 17. HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 18. HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 19. HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 20.
  • 21. Mink ● Google Chrome browser extension ● Indicates archival capture count as you browse ● Quickly submit URI to multiple archives from UI ● From Mink(owski Space) HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 22. Mink - Recent Advancements IIPC Web Archiving Conference 2017 June 15, 2017 London, UK ● Enhance interface ○ Add number of archived pages to icon at bottom of page ○ Allow users to set preferences on how to view large set of mementos ● Communication with user-specified (or local) archive in additional to aggregated institutional archives’ results http://bit.ly/iipcWAC2017 @machawk1
  • 23. Mink - Previous Interface HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK ➢ Interface affected by page CSS ➢ Obtrusive on the viewport by default ➢ Haphazard, inconsistent animations http://bit.ly/iipcWAC2017 @machawk1
  • 24. Mink - User Interface Revamp HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 25. Mink - User Interface Revamp HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK ➢ Interface-on-demand ➢ Shadow DOM, no CSS intrusion ➢ More consistent, intuitive Miller columns for many captures http://bit.ly/iipcWAC2017 @machawk1
  • 26. Mink - User Interface Revamp HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 27. Mink - Communication with Local Archives HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK Any -compatible archive Aggregated TimeMap http://bit.ly/iipcWAC2017 @machawk1
  • 28. Mink usage GIF, also available at: https://youtu.be/bGjxofpTgv4 http://bit.ly/iipcWAC2017
  • 29. Tools’ Integration HD-51670-13 • HK-50181-14 http://bit.ly/iipcWAC2017 @machawk1 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK
  • 30. Tools’ Integration: WARCreate→WAIL ● Save WARC directly to local archive (by reference [easier] ○ By-value integration feasibility being investigated a la WASAPI ● Automatically indexed and replayable HD-51670-13 • HK-50181-14 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017 @machawk1
  • 31. Tools’ Integration: Mink→WAIL any -compatible archive http://bit.ly/iipcWAC2017 @machawk1 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK
  • 32. Some Future Work ● Decouple Mink from external Memento aggregator ○ Client-side customizable set of archives instead ● WARC replay using browser extensions/apps ● Further integration with other archiving tools in WAIL ○ Re-add Memgator Memento aggregator (removed from Electron version) ● Firefox version of tools ○ XUL→WebExtensions ○ Decouple from Chrome APIs ● Integration with InterPlanetary Wayback (speaking about later today) HD-51670-13 • HK-50181-14 http://bit.ly/iipcWAC2017 @machawk1 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK
  • 33. Acknowledgements ● NEH Grant #s HD-51670-13 • HK-50181-14 ● Dr. Liza Potts and WIDE Research Center at Michigan State University ● ODU SEES Travel Grant HD-51670-13 • HK-50181-14 http://bit.ly/iipcWAC2017 @machawk1 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK
  • 34. Archive What I See Now Personal Web Archiving with WARCs Michele C. Weigle, Michael L. Nelson, Mat Kelly, and John Berlin Web Science and Digital Libraries (WS-DL) Research Group Old Dominion University ws-dl.cs.odu.edu HD-51670-13 • HK-50181-14 @machawk1 IIPC Web Archiving Conference 2017 June 15, 2017 London, UK http://bit.ly/iipcWAC2017