SlideShare une entreprise Scribd logo
1  sur  28
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Blockchain Can Not Be Used To Verify
Replayed Archived Web Pages
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
Supported in part by The Andrew Mellon Foundation
and the National Science Foundation
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
This is not what you think it is…
https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
This is not what you think it is…
https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps
“…right now you can get timestamps for every book,
movie, song, computer program, legal document,
etc. in the thousands of collections in the archive.
In the future we hope to be able to work with the
Internet Archive to extend this to timestamping
website snapshots…”
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
TL;DR
Web archiving is not file backup.
Backup = prevent, detect, repair changes
Web archiving = continuous changes to replicate the past
Naïve fixity techniques are
not applicable for web archiving.
Since 3rd
party audits are not feasible, as web archives proliferate
verifying web archives will centralize / federate.
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
A simplified workflow of web archiving
$ wget
WARC/1.0
WARC-Type: warcinfo
Content-Type: application/warc-fields
WARC-Date: 2018-11-03T17:20:02Z
WARC-Record-ID: <urn:uuid:6d14bf1d-0ef7-
4f03-9de2-e578d105d3cb>
WARC-Filename: foo.warc.warc.gz
WARC-Block-Digest:
sha1:WWSSYDYY7HTP4JTVOZANSIFPFHUJU64E
Content-Length: 257
software: Wget/1.15 (linux-gnu)
format: WARC File Format 1.0
[much deletia]
1) live web site
https://climate.nasa.gov/vital-signs/carbon-dioxide/
2) Crawled by any of
several archival crawlers 3) Result stored in a WARC File
(like tar or zip, but for Web archives)
4) WARC files are indexed,
served by replay software
(there are several variations of
Wayback Machine)
5) User chooses date of
capture (Memento-Datetime)
6) Page replayed with banner,
rewritten links, etc.
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
(apologies to Peter Arnett)
“In order to save the page,
we had to completely change it”
Yes, some archives (including most versions of Wayback) provide “raw” access,
but modifications can still happen (how/why is beyond the scope of this presentation).
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
I’ve got mad HTML skillz
https://www.cs.odu.edu/~mln/travel.html
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Same page, archived at IA
https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html
Archival Metadata
The banner tells the user the original URL,
which archive the page resides in,
when it was archived, how many copies, etc.
Links are rewritten to point back
into the archive, not the live web.
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Same page, archived at IA
https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html
$ curl -s https://www.cs.odu.edu/~mln/travel.html | head -5
<body bgcolor=white>
<pre>
-January 31-February 1, 2019, NYC, ACM Publications Board Meeting
$ curl -s https://www.cs.odu.edu/~mln/travel.html | wc
585 2361 26471
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Same page, archived at IA
https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html
$ curl -s https://www.cs.odu.edu/~mln/travel.html | head -5
<body bgcolor=white>
<pre>
-January 31-February 1, 2019, NYC, ACM Publications Board Meeting
$ curl -s https://www.cs.odu.edu/~mln/travel.html | wc
585 2361 26471
$ curl -s https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html | head -5
<script src="//archive.org/includes/analytics.js?v=cf34f82" type="text/javascript"></script>
<script type="text/javascript">window.addEventListener('DOMContentLoaded',function(){var
v=archive_analytics.values;v.service='wb';v.server_name='wwwb-
app40.us.archive.org’;v.server_ms=208;archive_analytics.send_pageview({});});</script>
<script type="text/javascript" src="/static/js/ait-client-rewrite.js?v=1538596186.0" charset="utf-
8"></script>
<script type="text/javascript">
WB_wombat_Init('https://web.archive.org/web', '20181104174441', 'www.cs.odu.edu');
$ curl -s https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html | wc
618 2472 33787
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Same page, archived at archive.today
http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Same page, archived at archive.today
http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html
$ curl -s http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html | head -5
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html style="background-
color:#EEEEEE" prefix="og: http://ogp.me/ns# article: http://ogp.me/ns/article#" itemscope
itemtype="http://schema.org/Article"><!--174.109.72.208--><!--curl/7.30.0--><head><meta http-
equiv="Content-Type" content="text/html;charset=utf-8"/><meta name="robots"
content="index,noarchive"/><meta name="viewport" content="device-width=300, initial-scale=1"/><meta
property="twitter:card" content="summary"/><meta property="twitter:site" content="@archiveis"/><meta
property="og:type" content="article"/><meta property="og:site_name" content="archive.is"/><meta
property="og:url" content="http://archive.is/l6QdV" itemprop="url"/><meta property="og:title"
content="https://www.cs.odu.edu/~mln/travel.html"/><meta property="twitter:title"
content="https://www.cs.odu.edu/~mln/travel.html"/><meta property="twitter:description"
content="archived 4 Nov 2018 17:46:33 UTC" itemprop="description"/><meta
property="article:published_time" content="2018-11-04T17:46:33Z" itemprop="dateCreated"/><meta
property="article:modified_time" content="2018-11-04T17:46:33Z" itemprop="dateModified"/><link
rel="image_src"
href="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png"/><meta
property="og:image"
content="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png"
itemprop="image"/><meta property="twitter:image"
content="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png"/><meta
property="twitter:image:src"
content="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png"/><meta
property="twitter:image:width" content="1024"/><meta property="twitter:image:height"
content="768"/><link rel="icon" href="//www.google.com/s2/favicons?domain=www.cs.odu.edu"/><link
rel="canonical" href="https://archive.is/l6QdV"/><link rel="bookmark"
href="http://archive.today/20181104174633/https://www.cs.odu.edu/~mln/travel.html"/><title></title><
/head><body style="margin:0;background-color:#EEEEEE"><center><div id="HEADER" style="font-
family:sans-serif;background
[much deletia – you get the point]
$ curl -s http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html | wc
730 3640 62392
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
If we just had isolated, static pages
(e.g., individual jpegs, pdfs, mp3s)
then there’d be no problem.
But HTML has:
1) links,
2) embedded resources (including iframes), and
3) Javascript, which can modify the HTML.
And HTTP has no “bulk download”,
so you can’t grab an entire site instantaneously.
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
WARC/1.0
WARC-Type: warcinfo
Content-Type: application/warc-fields
WARC-Date: 2018-11-03T17:20:02Z
WARC-Record-ID: <urn:uuid:6d14bf1d-0ef7-4f03-9de2-e578d105d3cb>
WARC-Filename: climate.nasa.gov.warc.gz
WARC-Block-Digest: sha1:WWSSYDYY7HTP4JTVOZANSIFPFHUJU64E
Content-Length: 257
software: Wget/1.15 (linux-gnu)
format: WARC File Format 1.0
conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
robots: classic
wget-arguments: "--warc-file=climate.nasa.gov" "https://climate.nasa.gov/vital-signs/carbon-dioxide/"
WARC/1.0
WARC-Type: request
WARC-Target-URI: https://climate.nasa.gov/vital-signs/carbon-dioxide/
Content-Type: application/http;msgtype=request
WARC-Date: 2018-11-03T17:20:02Z
WARC-Record-ID: <urn:uuid:e44bc1ea-61a1-4200-b94f-60042456f638>
WARC-IP-Address: 54.230.195.16
WARC-Warcinfo-ID: <urn:uuid:6d14bf1d-0ef7-4f03-9de2-e578d105d3cb>
WARC-Block-Digest: sha1:CLODKYDXCHPVOJMJWHJVT3EJJDKI2RTQ
Content-Length: 141
GET /vital-signs/carbon-dioxide/ HTTP/1.1
User-Agent: Wget/1.15 (linux-gnu)
Accept: */*
Host: climate.nasa.gov
Connection: Keep-Alive
WARC/1.0
WARC-Type: response
WARC-Record-ID: <urn:uuid:5d8861ef-93c5-4d9c-87b8-4f427f963f7c>
WARC-Warcinfo-ID: <urn:uuid:6d14bf1d-0ef7-4f03-9de2-e578d105d3cb>
WARC-Concurrent-To: <urn:uuid:e44bc1ea-61a1-4200-b94f-60042456f638>
WARC-Target-URI: https://climate.nasa.gov/vital-signs/carbon-dioxide/
WARC-Date: 2018-11-03T17:20:02Z
We could hash the
WARC file
$ md5sum climate.nasa.gov.warc.gz
652853fe1bc8cb273cdf73aad8a489ca climate.nasa.gov.warc.gz
But this nasa.gov page contains:
•201 images
•19 Javascript files
•3 CSS files
At a large archive like IA they
could be in multiple WARC files;
worst case is 224 WARC files.
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
We can detect changes in the root HTML
https://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
But what if the change is in
an embedded resource?
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Clearly we need to render the entire page,
then compute the hash.
Unfortunately, that’s not easy.
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Load the archived page, get an eagle
https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Hit “reload”, get a tiger
https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Hit “reload” again, get a mountain
https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
“Look on my Javascript, ye Mighty, and despair!”
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Actually, the fws.gov example was super easy;
most changes are much harder to trace.
Mohamed Aturban, unpublished, memento:
http://web.archive.org/web/20130724144801/http://www.cnn.com/
Animated GIF: https://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Temporal violations:
reconstructing pages that never existed on the live web
(examples below are transient; sometimes you get the 1st
image, sometimes the 2nd
image)
embedded in umich.edu memento, archived in perma.cc
2nd
image is compressed (12209 vs. 19448 bytes); 2nd
image modified in 2017-03, but replayed in a 2017-01 page
embedded in copybogger.com memento, archived in archive.org
2nd
image modified in 2017-12, but replayed in a 2017-11 page; blackout for privacy
Temporal violations: https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
1 WARC file, 2 Wayback Machines, 3 Browsers
= 6 different replays
http://wayback.archive-it.org/all/20130106140348/http://www.harvard.edu/
http://web.archive.org/web/20130106140348/http://www.harvard.edu/
see also. https://ws-dl.blogspot.com/2016/12/2016-12-20-archiving-pages-with.html
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Archive URIMs With at least two hashes
----------------------------------------------------------------
webharvest.gov 712 712 (100%)
archive.is 1396 1364 (97.70%)
vefsafn.is 1589 739 (46.50%)
archive-it.org 1383 815 (58.92%)
stanford.edu 1222 831 (68.00%)
internetmemory.org 979 979 (100%)
nationalarchives.gov.uk 994 972 (97.78%)
archive.bibalex.org 199 177 (88.94%)
bac-lac.gc.ca 351 351 (100%)
proni.gov.uk 469 129 (27.50%)
www.webarchive.org.uk 349 329 (94.26%)
www.webcitation.org 1585 828 (52.23%)
veebiarhiiv.digar.ee 488 308 (63.11%)
webarchive.loc.gov 1594 526 (32.99%)
arquivo.pt 1569 1563 (99.61%)
web.archive.org 1566 1334 (85.18%)
perma-archives.org 182 180 (98.90%)
----------------------------------------------------------------
16627 12137 (72.99%)
Data from 35 downloads over an 11 month period (2017-11 – 2018-10), Mohamed Aturban (in preparation)
(apologies to Heraclitus)
You cannot replay twice the same archived page
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
A metaphor for replaying archived web pages
https://www.youtube.com/watch?v=ekO3Z3XWa0Q
https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
King of Swamp Castle: live web/ground truth
Guard: archival replay
https://www.youtube.com/watch?v=ekO3Z3XWa0Q
https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail
$ echo "Make sure the prince doesn't leave this room until I come and get him." | md5
57facbb2734d36cb823f4230cc07b888
$ echo "Not to leave the room even if you come and get him." | md5
3ba0a2359d63f43cbe9e11fb5a179b8d
$ echo "Until you come and get him, we're not to enter the room." | md5
ade3539aaa8a6d8724193e9a37f3ca6d
$ echo "We don't need to do anything apart from just stop him entering the room." | md5
ea812f5b997aa42a8f293bd1ee536fd0
$ echo "Oh yes, we'll keep him in here, obviously. But if he had to leave, and we went with him..." | md5
55d184b77d99eed6367535ef3c05d7aa
$ echo "Oh, yes of course. I thought you meant him! You know it seemed a bit daft me having to guard him
when he's a guard." | md5
Symposium on Blockchain and Trusted Repositories, 2018-11-05,
@phonedude_mln, @WebSciDL
Archival replay & blockchain:
building a castle in a swamp
• Fixity checks only work when it’s clear what to hash
– Hash only the root HTML and modifications are possible via embedded
resources (false negatives)
– Recursively hash all embedded resources and you’ll rarely get the
same hash (false positives)
• There is increasing incentive to attack existing archives and
create networks of fake archives
– http://bit.ly/Weaponized-Web-Archives
• We are investigating archive-aware hashing functions
– Vagaries of replay won’t be “fixed”; we need to adapt our hashing
• Central vs. distributed?
– Ask yourself: “is this a winner-take-all market?” (hello, FANG)
– https://blog.dshr.org/2018/09/blockchain-solves-preservation.html

Contenu connexe

Tendances

Linked Data Patterns
Linked Data PatternsLinked Data Patterns
Linked Data PatternsLeigh Dodds
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
Linked Open Data for Archives
Linked Open Data for ArchivesLinked Open Data for Archives
Linked Open Data for ArchivesCliff Landis
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerWiLS
 
Linked Data: turning the web into a context graph
Linked Data: turning the web into a context graphLinked Data: turning the web into a context graph
Linked Data: turning the web into a context graphLeigh Dodds
 
21st Century Archival Appraisal
21st Century Archival Appraisal21st Century Archival Appraisal
21st Century Archival AppraisalArian Ravanbakhsh
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsPromises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsEmily Nimsakont
 
Life after MARC: Experimenting with Cataloging Tools of the Future
Life after MARC: Experimenting with Cataloging Tools of the FutureLife after MARC: Experimenting with Cataloging Tools of the Future
Life after MARC: Experimenting with Cataloging Tools of the FutureEmily Nimsakont
 
Open Data and Data Journalism
Open Data and Data JournalismOpen Data and Data Journalism
Open Data and Data JournalismIrina Radchenko
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?Emily Nimsakont
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Emily Nimsakont
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Life after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the FutureLife after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the FutureEmily Nimsakont
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 

Tendances (20)

Linked Data Patterns
Linked Data PatternsLinked Data Patterns
Linked Data Patterns
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Linked Open Data for Archives
Linked Open Data for ArchivesLinked Open Data for Archives
Linked Open Data for Archives
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve Meyer
 
Linked Data: turning the web into a context graph
Linked Data: turning the web into a context graphLinked Data: turning the web into a context graph
Linked Data: turning the web into a context graph
 
21st Century Archival Appraisal
21st Century Archival Appraisal21st Century Archival Appraisal
21st Century Archival Appraisal
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsPromises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
 
Life after MARC: Experimenting with Cataloging Tools of the Future
Life after MARC: Experimenting with Cataloging Tools of the FutureLife after MARC: Experimenting with Cataloging Tools of the Future
Life after MARC: Experimenting with Cataloging Tools of the Future
 
Open Data and Data Journalism
Open Data and Data JournalismOpen Data and Data Journalism
Open Data and Data Journalism
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
GODORT SLDTF 2009 Meeting Outline
GODORT SLDTF 2009 Meeting OutlineGODORT SLDTF 2009 Meeting Outline
GODORT SLDTF 2009 Meeting Outline
 
Life after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the FutureLife after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the Future
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 

Similaire à Blockchain Not Able to Verify Replayed Archived Web Pages

Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Bridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentBridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentAnita Riley
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersjudell
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingSawood Alam
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RESChristophe Guéret
 
Stream Reasoning : Where We Got So Far
Stream Reasoning: Where We Got So FarStream Reasoning: Where We Got So Far
Stream Reasoning : Where We Got So FarEmanuele Della Valle
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10Scott Edmunds
 
Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Asa Letourneau
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSawood Alam
 
Introduction to Omeka
Introduction to OmekaIntroduction to Omeka
Introduction to OmekaShawn Day
 
Presenting Your Digital Research
Presenting Your Digital ResearchPresenting Your Digital Research
Presenting Your Digital ResearchShawn Day
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked DataRichard Wallis
 

Similaire à Blockchain Not Able to Verify Replayed Archived Web Pages (20)

Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Bridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentBridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital Environment
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive ProfilingMementoMap Framework for Flexible and Adaptive Web Archive Profiling
MementoMap Framework for Flexible and Adaptive Web Archive Profiling
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
Stream Reasoning : Where We Got So Far
Stream Reasoning: Where We Got So FarStream Reasoning: Where We Got So Far
Stream Reasoning : Where We Got So Far
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10
 
Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
Introduction to Omeka
Introduction to OmekaIntroduction to Omeka
Introduction to Omeka
 
Presenting Your Digital Research
Presenting Your Digital ResearchPresenting Your Digital Research
Presenting Your Digital Research
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked Data
 

Plus de Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesMichael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?Michael Nelson
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web ArchivesMichael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolMichael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Michael Nelson
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet ArchiveMichael Nelson
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
 

Plus de Michael Nelson (20)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 

Dernier

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Blockchain Not Able to Verify Replayed Archived Web Pages

  • 1. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Blockchain Can Not Be Used To Verify Replayed Archived Web Pages Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, Mohamed Aturban Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein Supported in part by The Andrew Mellon Foundation and the National Science Foundation
  • 2. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL This is not what you think it is… https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps
  • 3. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL This is not what you think it is… https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps “…right now you can get timestamps for every book, movie, song, computer program, legal document, etc. in the thousands of collections in the archive. In the future we hope to be able to work with the Internet Archive to extend this to timestamping website snapshots…”
  • 4. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL TL;DR Web archiving is not file backup. Backup = prevent, detect, repair changes Web archiving = continuous changes to replicate the past Naïve fixity techniques are not applicable for web archiving. Since 3rd party audits are not feasible, as web archives proliferate verifying web archives will centralize / federate.
  • 5. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL A simplified workflow of web archiving $ wget WARC/1.0 WARC-Type: warcinfo Content-Type: application/warc-fields WARC-Date: 2018-11-03T17:20:02Z WARC-Record-ID: <urn:uuid:6d14bf1d-0ef7- 4f03-9de2-e578d105d3cb> WARC-Filename: foo.warc.warc.gz WARC-Block-Digest: sha1:WWSSYDYY7HTP4JTVOZANSIFPFHUJU64E Content-Length: 257 software: Wget/1.15 (linux-gnu) format: WARC File Format 1.0 [much deletia] 1) live web site https://climate.nasa.gov/vital-signs/carbon-dioxide/ 2) Crawled by any of several archival crawlers 3) Result stored in a WARC File (like tar or zip, but for Web archives) 4) WARC files are indexed, served by replay software (there are several variations of Wayback Machine) 5) User chooses date of capture (Memento-Datetime) 6) Page replayed with banner, rewritten links, etc.
  • 6. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL (apologies to Peter Arnett) “In order to save the page, we had to completely change it” Yes, some archives (including most versions of Wayback) provide “raw” access, but modifications can still happen (how/why is beyond the scope of this presentation).
  • 7. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL I’ve got mad HTML skillz https://www.cs.odu.edu/~mln/travel.html
  • 8. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Same page, archived at IA https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html Archival Metadata The banner tells the user the original URL, which archive the page resides in, when it was archived, how many copies, etc. Links are rewritten to point back into the archive, not the live web.
  • 9. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Same page, archived at IA https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html $ curl -s https://www.cs.odu.edu/~mln/travel.html | head -5 <body bgcolor=white> <pre> -January 31-February 1, 2019, NYC, ACM Publications Board Meeting $ curl -s https://www.cs.odu.edu/~mln/travel.html | wc 585 2361 26471
  • 10. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Same page, archived at IA https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html $ curl -s https://www.cs.odu.edu/~mln/travel.html | head -5 <body bgcolor=white> <pre> -January 31-February 1, 2019, NYC, ACM Publications Board Meeting $ curl -s https://www.cs.odu.edu/~mln/travel.html | wc 585 2361 26471 $ curl -s https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html | head -5 <script src="//archive.org/includes/analytics.js?v=cf34f82" type="text/javascript"></script> <script type="text/javascript">window.addEventListener('DOMContentLoaded',function(){var v=archive_analytics.values;v.service='wb';v.server_name='wwwb- app40.us.archive.org’;v.server_ms=208;archive_analytics.send_pageview({});});</script> <script type="text/javascript" src="/static/js/ait-client-rewrite.js?v=1538596186.0" charset="utf- 8"></script> <script type="text/javascript"> WB_wombat_Init('https://web.archive.org/web', '20181104174441', 'www.cs.odu.edu'); $ curl -s https://web.archive.org/web/20181104174441/https://www.cs.odu.edu/~mln/travel.html | wc 618 2472 33787
  • 11. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Same page, archived at archive.today http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html
  • 12. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Same page, archived at archive.today http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html $ curl -s http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html | head -5 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html style="background- color:#EEEEEE" prefix="og: http://ogp.me/ns# article: http://ogp.me/ns/article#" itemscope itemtype="http://schema.org/Article"><!--174.109.72.208--><!--curl/7.30.0--><head><meta http- equiv="Content-Type" content="text/html;charset=utf-8"/><meta name="robots" content="index,noarchive"/><meta name="viewport" content="device-width=300, initial-scale=1"/><meta property="twitter:card" content="summary"/><meta property="twitter:site" content="@archiveis"/><meta property="og:type" content="article"/><meta property="og:site_name" content="archive.is"/><meta property="og:url" content="http://archive.is/l6QdV" itemprop="url"/><meta property="og:title" content="https://www.cs.odu.edu/~mln/travel.html"/><meta property="twitter:title" content="https://www.cs.odu.edu/~mln/travel.html"/><meta property="twitter:description" content="archived 4 Nov 2018 17:46:33 UTC" itemprop="description"/><meta property="article:published_time" content="2018-11-04T17:46:33Z" itemprop="dateCreated"/><meta property="article:modified_time" content="2018-11-04T17:46:33Z" itemprop="dateModified"/><link rel="image_src" href="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png"/><meta property="og:image" content="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png" itemprop="image"/><meta property="twitter:image" content="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png"/><meta property="twitter:image:src" content="https://archive.is/l6QdV/d7e3acef18a0433590880dfcc26f8e1f5f18f91e/scr.png"/><meta property="twitter:image:width" content="1024"/><meta property="twitter:image:height" content="768"/><link rel="icon" href="//www.google.com/s2/favicons?domain=www.cs.odu.edu"/><link rel="canonical" href="https://archive.is/l6QdV"/><link rel="bookmark" href="http://archive.today/20181104174633/https://www.cs.odu.edu/~mln/travel.html"/><title></title>< /head><body style="margin:0;background-color:#EEEEEE"><center><div id="HEADER" style="font- family:sans-serif;background [much deletia – you get the point] $ curl -s http://archive.is/20181104174633/https://www.cs.odu.edu/~mln/travel.html | wc 730 3640 62392
  • 13. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL If we just had isolated, static pages (e.g., individual jpegs, pdfs, mp3s) then there’d be no problem. But HTML has: 1) links, 2) embedded resources (including iframes), and 3) Javascript, which can modify the HTML. And HTTP has no “bulk download”, so you can’t grab an entire site instantaneously.
  • 14. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL WARC/1.0 WARC-Type: warcinfo Content-Type: application/warc-fields WARC-Date: 2018-11-03T17:20:02Z WARC-Record-ID: <urn:uuid:6d14bf1d-0ef7-4f03-9de2-e578d105d3cb> WARC-Filename: climate.nasa.gov.warc.gz WARC-Block-Digest: sha1:WWSSYDYY7HTP4JTVOZANSIFPFHUJU64E Content-Length: 257 software: Wget/1.15 (linux-gnu) format: WARC File Format 1.0 conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf robots: classic wget-arguments: "--warc-file=climate.nasa.gov" "https://climate.nasa.gov/vital-signs/carbon-dioxide/" WARC/1.0 WARC-Type: request WARC-Target-URI: https://climate.nasa.gov/vital-signs/carbon-dioxide/ Content-Type: application/http;msgtype=request WARC-Date: 2018-11-03T17:20:02Z WARC-Record-ID: <urn:uuid:e44bc1ea-61a1-4200-b94f-60042456f638> WARC-IP-Address: 54.230.195.16 WARC-Warcinfo-ID: <urn:uuid:6d14bf1d-0ef7-4f03-9de2-e578d105d3cb> WARC-Block-Digest: sha1:CLODKYDXCHPVOJMJWHJVT3EJJDKI2RTQ Content-Length: 141 GET /vital-signs/carbon-dioxide/ HTTP/1.1 User-Agent: Wget/1.15 (linux-gnu) Accept: */* Host: climate.nasa.gov Connection: Keep-Alive WARC/1.0 WARC-Type: response WARC-Record-ID: <urn:uuid:5d8861ef-93c5-4d9c-87b8-4f427f963f7c> WARC-Warcinfo-ID: <urn:uuid:6d14bf1d-0ef7-4f03-9de2-e578d105d3cb> WARC-Concurrent-To: <urn:uuid:e44bc1ea-61a1-4200-b94f-60042456f638> WARC-Target-URI: https://climate.nasa.gov/vital-signs/carbon-dioxide/ WARC-Date: 2018-11-03T17:20:02Z We could hash the WARC file $ md5sum climate.nasa.gov.warc.gz 652853fe1bc8cb273cdf73aad8a489ca climate.nasa.gov.warc.gz But this nasa.gov page contains: •201 images •19 Javascript files •3 CSS files At a large archive like IA they could be in multiple WARC files; worst case is 224 WARC files.
  • 15. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL We can detect changes in the root HTML https://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  • 16. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL But what if the change is in an embedded resource?
  • 17. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Clearly we need to render the entire page, then compute the hash. Unfortunately, that’s not easy.
  • 18. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Load the archived page, get an eagle https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
  • 19. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Hit “reload”, get a tiger https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
  • 20. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Hit “reload” again, get a mountain https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
  • 21. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL “Look on my Javascript, ye Mighty, and despair!”
  • 22. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Actually, the fws.gov example was super easy; most changes are much harder to trace. Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/ Animated GIF: https://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html
  • 23. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Temporal violations: reconstructing pages that never existed on the live web (examples below are transient; sometimes you get the 1st image, sometimes the 2nd image) embedded in umich.edu memento, archived in perma.cc 2nd image is compressed (12209 vs. 19448 bytes); 2nd image modified in 2017-03, but replayed in a 2017-01 page embedded in copybogger.com memento, archived in archive.org 2nd image modified in 2017-12, but replayed in a 2017-11 page; blackout for privacy Temporal violations: https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
  • 24. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL 1 WARC file, 2 Wayback Machines, 3 Browsers = 6 different replays http://wayback.archive-it.org/all/20130106140348/http://www.harvard.edu/ http://web.archive.org/web/20130106140348/http://www.harvard.edu/ see also. https://ws-dl.blogspot.com/2016/12/2016-12-20-archiving-pages-with.html
  • 25. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Archive URIMs With at least two hashes ---------------------------------------------------------------- webharvest.gov 712 712 (100%) archive.is 1396 1364 (97.70%) vefsafn.is 1589 739 (46.50%) archive-it.org 1383 815 (58.92%) stanford.edu 1222 831 (68.00%) internetmemory.org 979 979 (100%) nationalarchives.gov.uk 994 972 (97.78%) archive.bibalex.org 199 177 (88.94%) bac-lac.gc.ca 351 351 (100%) proni.gov.uk 469 129 (27.50%) www.webarchive.org.uk 349 329 (94.26%) www.webcitation.org 1585 828 (52.23%) veebiarhiiv.digar.ee 488 308 (63.11%) webarchive.loc.gov 1594 526 (32.99%) arquivo.pt 1569 1563 (99.61%) web.archive.org 1566 1334 (85.18%) perma-archives.org 182 180 (98.90%) ---------------------------------------------------------------- 16627 12137 (72.99%) Data from 35 downloads over an 11 month period (2017-11 – 2018-10), Mohamed Aturban (in preparation) (apologies to Heraclitus) You cannot replay twice the same archived page
  • 26. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL A metaphor for replaying archived web pages https://www.youtube.com/watch?v=ekO3Z3XWa0Q https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail
  • 27. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL King of Swamp Castle: live web/ground truth Guard: archival replay https://www.youtube.com/watch?v=ekO3Z3XWa0Q https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail $ echo "Make sure the prince doesn't leave this room until I come and get him." | md5 57facbb2734d36cb823f4230cc07b888 $ echo "Not to leave the room even if you come and get him." | md5 3ba0a2359d63f43cbe9e11fb5a179b8d $ echo "Until you come and get him, we're not to enter the room." | md5 ade3539aaa8a6d8724193e9a37f3ca6d $ echo "We don't need to do anything apart from just stop him entering the room." | md5 ea812f5b997aa42a8f293bd1ee536fd0 $ echo "Oh yes, we'll keep him in here, obviously. But if he had to leave, and we went with him..." | md5 55d184b77d99eed6367535ef3c05d7aa $ echo "Oh, yes of course. I thought you meant him! You know it seemed a bit daft me having to guard him when he's a guard." | md5
  • 28. Symposium on Blockchain and Trusted Repositories, 2018-11-05, @phonedude_mln, @WebSciDL Archival replay & blockchain: building a castle in a swamp • Fixity checks only work when it’s clear what to hash – Hash only the root HTML and modifications are possible via embedded resources (false negatives) – Recursively hash all embedded resources and you’ll rarely get the same hash (false positives) • There is increasing incentive to attack existing archives and create networks of fake archives – http://bit.ly/Weaponized-Web-Archives • We are investigating archive-aware hashing functions – Vagaries of replay won’t be “fixed”; we need to adapt our hashing • Central vs. distributed? – Ask yourself: “is this a winner-take-all market?” (hello, FANG) – https://blog.dshr.org/2018/09/blockchain-solves-preservation.html