SlideShare une entreprise Scribd logo
1  sur  68
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Weaponized Web Archives:
Provenance Laundering of Short Order Evidence
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
Supported in part by The Andrew Mellon Foundation.
Opinions expressed are those of the presenter.
based on a 2018-03-23 presentation at the National Forum on Ethics and Archiving the Web
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
TL;DR
• We are on the cusp of a
“Photoshop” moment for
synthesizing convincing
audio/video
• Web archives will be
weaponized to:
– alter trustworthy content
– obfuscate provenance of
untrustworthy content
web archives
https://imgur.com/gallery/akeVeiq
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
background:
what’s a web archive?
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDLhttp://web.archive.org/web/*/http://www.odu.edu/
also: http://whatdiditlooklike.mementoweb.org/tagged/odu.edu
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
what was here?
we’ll likely never know…
(ok, xkcd gives us an idea…)
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Sure, go ahead and archive
www.odu.edu -- but what about
archiving all your
Facebook posts,
tweets, instagrams,
check-ins, etc.?
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
“Why are they putting all that online?”
“And it’s easy to deride this sort of
thing as self-absorbed publishing –
why would anyone put such drivel
out in public?
It’s simple. They’re not talking to you.
We misread these seemingly inane
posts because we’re so unused to
seeing written material in public that
isn’t intended for us.”
Clay Shirky, 2008, p. 85
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
We have semi-private discussions in
public spaces all the time…
https://www.nytimes.com/2017/09/19/us/politics/isnt-that-the-trump-lawyer-a-reporters-accidental-scoop.html
https://well.blogs.nytimes.com/2013/06/21/how-the-hum-of-a-coffee-shop-can-boost-creativity/
Even though we know others can eavesdrop – maybe we even want that –
if they whipped out their iPhone and started recording us, it might change our behavior.
See also: gevulot, agoras, and exomemory in “The Quantum Thief”
https://en.wikipedia.org/wiki/The_Quantum_Thief
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Given enough time,
our private/public performances become art
https://archive.org/details/prelingerhomemovies
https://genius.com/Dj-shadow-letter-from-home-lyrics
https://www.youtube.com/watch?v=MIR62rreRKY
https://www.youtube.com/watch?v=fKjg1HfZfPM#t=2m46s
https://www.sinecurebooks.com/shop/enjoy-the-experience-bundle/
personally
identifiable
information!
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
as the web archiving community, we constantly are asking ourselves:
“Are we creating tools that aid the
surveillance state?”
Spoiler alert: Yes.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Our attitude about the
surveillance state is contextual.
https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/
http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says-
releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html
Boston Marathon Bombing, 2013
https://twitter.com/charliespiering/status/976430395964215296
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
The surveillance state
also surveils the (other) state.
https://www.theguardian.com/uk-news/2018/sep/05/planes-trains-and-fake-names-the-trail-left-by-skripal-suspects
“Planes, trains and fake names:
the trail left by Skripal suspects”
https://www.cnn.com/2018/10/22/middleeast/saudi-operative-jamal-khashoggi-clothes/index.html
“Surveillance footage shows
Saudi 'body double' in
Khashoggi's clothes after he was
killed, Turkish source says”
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Meanwhile, we happily pay monthly
service fees to be surveilled!
https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/
https://twitter.com/mtdukes/status/974281625348558848
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
“Quis custodiet ipsos custodes?”
A: Social media.
https://twitter.com/WiredUK/status/958084308924760065 https://twitter.com/vicenews/status/670059493581959168
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
We don’t feel too bad when we archive accounts that
later prove to be trolls / sockpuppets / sybils
https://twitter.com/safety_refinery/status/934982022078042112
https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html
https://twitter.com/documentnow/status/964882665982722048
https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e
https://github.com/fivethirtyeight/russian-troll-tweets/
https://blog.twitter.com/official/en_us/topics/company/2018/enabling-further-research-of-information-operations-on-twitter.html
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Nor do we feel bad for holding
public figures / organizations accountable
https://twitter.com/landlibrarian/status/975910915135754240
https://twitter.com/IEEEhistory/status/960358528987942912
http://archive.is/xh58B
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
But our attitude is different when those
organizations explicitly monitor us
https://twitter.com/pierce/status/980860438119301120https://twitter.com/LSJNews/status/979017806116245504
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
We can & should discuss our role in surveillance,
but realize Facebook et al. are operating as designed
(and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram)
https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624
see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
as the web archiving community, we should be asking ourselves:
“Can we authenticate web content?”
Spoiler alert: Yes. A bit.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
“We cannot accept this photograph in evidence”
http://www.politifact.com/florida/statements/2018/mar/27/
blog-posting/david-hogg-not-school-during-shooting-s-fake-news/
https://twitter.com/acnwala/status/977982456296034304
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Granted, we’ve had obvious, cut-n-paste /
mashup “evidence” for a long time…
Victorian Photo Collage
https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage
“The Flying Saucer” (1956)
https://en.wikipedia.org/wiki/The_Flying_Saucer_(song)
https://www.youtube.com/watch?v=XCrn6QXvHLg
Brian Williams Raps ‘Gin & Juice’
https://www.youtube.com/watch?v=XlGLhYFrv6w
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Crude techniques = humor,
sophisticated techniques = deception;
Brand’s prediction of “any day now” is now
Synthesizing Obama: Learning Lip Sync from Audio
SIGGRAPH 2017
https://grail.cs.washington.edu/projects/AudioToObama/
Face2Face: Real-time Face Capture and Reenactment
of RGB Videos, CVPR 2016
http://niessnerlab.org/projects/thies2016face.html
see also: https://www.youtube.com/watch?v=pkkph4JhrCg
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Clumsy, “collage/flying saucer/gin & juice” techniques
are already effective on social media
We are completely unprepared for
advanced, SIGGRAPH/CVPR techniques
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
“Surely web archives can be used to
establish priority and authenticity?”
Let’s look at some examples.
cf. https://gizmodo.com/how-archivists-could-stop-deepfakes-from-rewriting-hist-1829666009
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Neo-Nazis and “Black Panther”
Relationship Status: It’s Complicated
http://knowyourmeme.com/photos/1338390-black-panther
https://twitter.com/TamikaDMallory/status/964701120194019328
See also: https://www.snopes.com/fact-check/mexican-police-caravan-photos/
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
nydailynews.com provides screenshots,
but not links to the tweets…
http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
@AsianWifeHaver and @DSA_Boi_Pucci
are not on the live web…
$ curl -I https://twitter.com/AsianWifeHaver
HTTP/1.1 302 Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 103
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:09:27 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:09:27 GMT
location: https://twitter.com/account/suspended
$ curl -I https://twitter.com/AsianWifeHaver
HTTP/1.1 302 Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 103
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:09:27 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:09:27 GMT
location: https://twitter.com/account/suspended
$ curl -I https://twitter.com/DSA_Boi_Pucci
HTTP/1.1 404 Not Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 6329
content-security-policy: [deletia]
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:14:22 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:14:22 GMT
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
…nor are they in the Internet Archive
note: this exists only
because of the redirection
to the “suspended” page
http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver
http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Can’t find @DSA_Boi_Pucci in any archive
Typical archive URI construction:
archive.example.org/SomeString/CNN.com/travel
web.archive.org/web/*/twitter.com/DSA_Boi_Pucci
wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci
perma-archives.org/warc/twitter.com/DSA_Boi_Pucci
archive.is/twitter.com/DSA_Boi_Pucci
www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci
wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci
arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci
for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
At this point, the absence of evidence
means I cannot prove that
@DSA_Boi_Pucci:
1) ever existed
or
2) was not created by nydailynews.com
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
What if we checked these archives?
breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci
infowars.com/web/*/twitter.com/DSA_Boi_Pucci
iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci
InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci
What if they all said “nydailynews.com test account”?
Would you trust the results?
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Our entire national web preservation strategy
is predicated on
Brewster Kahle (IA) “not being evil”™
If he is leading a 20+ year sleeper cell, we’re doomed.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Web archives with international implications:
Malaysia Airlines Flight 17 (MH17)
http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info
http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
http://www.newyorker.com/magazine/2015/01/26/cobweb
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
(not really archived as well as we’d like)
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Ed and I Discuss Who Has What…
https://twitter.com/phonedude_mln/status/490171976389238784
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Remember MH17?
https://twitter.com/phonedude_mln/status/490171976389238784
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Alex is now 404.
Would multiple archives have convinced him?
https://twitter.com/quicknquiet
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Do we really have
“a perfect tool to produce `evidence’ of any kind”?
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Segal’s Law, restated for web archives:
The person with an archive knows what the page looked like.
The person with two archives is never sure.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
(apologies to Notorious B.I.G.)
“Mo Archives, Mo Problems”
Why? Because they’ll rarely agree.
Even a single archive is an unreliable witness:
zombies, temporal violations, and attacks
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Zombies: live web “leaking” into an archived page
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
this page is
from 2008
this ad is
from 2012
(when this
screen shot
was taken)
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Temporal violations: reconstructing legitimately
archived resources into a page that never existed
http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
https://ws-dl.blogspot.com/2018/04/2018-04-24-why-we-need-multiple-web.html
text (2004-12)
says rain,
image (2005-09)
is clear
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Directly attacking the archive
(in this case, via orphaned live web resources; “zombie attack”)
Lerner, Kohno, Roesner, 2017
https://doi.org/10.1145/3133956.3134042
see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html
page is from 2011,
iframe content is from 2017
(when screenshot was taken)
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Based on feedback from Lerner et al.,
IA has changed their playback
(specifically, with a Content-Security-Policy HTTP response header)
But playback remains problematic…
(apologies to Peter Arnett)
“In order to save the page, we had to completely change it”
let’s look at four common scenarios
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
1) JavaScript does not run correctly from the archive
http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html
https://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html
This is cnn.com not replaying;
it hasn’t replayed correctly since
November 1, 2016
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
2) Archived page doesn’t match live web experience
https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change
http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html
“only a ‘crisis actor’ would
tweet in Slovak!”
Now imagine she gets fed up,
deletes her account, and then
someone applies the
“abandoned acct / archive” attack
Justin Littman described:
https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
(apologies to Heraclitus)
3) You cannot replay twice the same archived page
Mohamed Aturban, unpublished, memento:
http://web.archive.org/web/20130724144801/http://www.cnn.com/
Animated GIF: https://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
4) Archives are not magic web sites;
they have the same problems as regular web sites
Archive URIMs With at least two hashes
----------------------------------------------------------------
webharvest.gov 712 712 (100%)
archive.is 1396 1364 (97.70%)
vefsafn.is 1589 739 (46.50%)
archive-it.org 1383 815 (58.92%)
stanford.edu 1222 831 (68.00%)
internetmemory.org 979 979 (100%)
nationalarchives.gov.uk 994 972 (97.78%)
archive.bibalex.org 199 177 (88.94%)
bac-lac.gc.ca 351 351 (100%)
proni.gov.uk 469 129 (27.50%)
www.webarchive.org.uk 349 329 (94.26%)
www.webcitation.org 1585 828 (52.23%)
veebiarhiiv.digar.ee 488 308 (63.11%)
webarchive.loc.gov 1594 526 (32.99%)
arquivo.pt 1569 1563 (99.61%)
web.archive.org 1566 1334 (85.18%)
perma-archives.org 182 180 (98.90%)
----------------------------------------------------------------
16627 12137 (72.99%)
Data from 35 downloads over an 11 month period (2017-11 – 2018-10), Mohamed Aturban (in preparation)
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
How can we differentiate between “normal”
archive playback modification vs. deception?
If the tweets or accts are deleted, we don’t know.
If I embed fake tweets in another page, then archive that page, only an expert can tell
the fake tweets don’t come from twitter.com (& fake archives will lie!)
And it is not in Twitter’s (perceived) self-interest to help, cf.:
https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/
https://www.vox.com/2018/10/29/18037880/twitter-may-remove-like-button
https://www.bloomberg.com/news/articles/2018-10-27/twitter-apologizes-for-ignoring-apparent-threat-in-tweet
These might have been swapped -- but how can you tell for sure?
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Inserting fakes into real archives
Here’s an actual page in the IA “proving”
Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg.
John Berlin, MS Thesis, 2018
https://www.youtube.com/watch?v=k3QTcJZdFfs
(actual URI-R & URI-M have also been obscured in the video to hide the technique)
The content is clearly fake, but imagine replacing:
1)“1992” with a more believable “2016”,
2)the fake domain with “bbc.com”, and
3)Brian Williams rapping with a synthesized Trump or Obama speech.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
“That will never happen!
…right?”
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
This isn’t just hypothetical…
The opening salvo in what is and isn’t a “deepfake”:
https://twitter.com/AaronBlake/status/1035124642456002565https://twitter.com/realDonaldTrump/status/1035120511259500544
https://news.vice.com/en_us/article/ne5x3d/trump-lester-holt-james-comey-nbc
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
The May, 2017 NBC interview is not
archived until August, 2018
(and even then, the video itself is not archived)
https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila
https://web.archive.org/web/*/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila
https://web.archive.org/web/20180825094239/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila
Clicking through to the video reveals a loop of postal
carrier slipping on ice; not the Lester Holt interview.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Now convince “Alex” that:
1) the live web nbc.com video has not been modified
2) the IA archive failure is not suspicious / convenient
3) the archived “copy” in infowars.com is not authentic
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Blockchain to the rescue!!!
<lasers>
<sirens>
<disco-thumping-soundtrack>
nope.
https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/
https://eprint.iacr.org/2017/375.pdf
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Instead, let’s use web archives
to monitor web archives.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Step 1: Push to multiple archives
web.archive.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180321/eaw.rhizome.org
archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Step 2: Compute fixity,
publish fixity “manifest” at a well-known location
manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org
manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org
manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org
manifest.org/20180322/archive.is/20180321/eaw.rhizome.org
It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that
should not change, like JPEGs and certain original HTTP response headers.
This example assumes the existence of a well-known server manifest.org.
Actual URIs can be a bit more complex using “Trusty URIs”:
http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Step 3: Wondering about veracity of an archived page?
Check manifest.org and recompute fixity.
manifest.org/20180322/web.archive.org/
web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org
what if manifest.org is down?
or possibly hacked?
We can’t know archive.org did not alter contents on ingest (20180321),
but we can verify that it has not changed since our observation (20180322)
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Step 4: Push manifest to multiple archives
web.archive.org/web/20180323/manifest.org/20180322/web.archi
ve.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180323/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome
.org
Now the 20180322 version of the manifest of archive.org’s
memento of rhizome.org is in four different archives.
The URIs are ugly, but the bottom line is an attacker would have to hack a
majority of 5 domains (manifest.org + 4 archives)
Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Wondering about veracity of an archived page?
Check all copies of manifest.org and take a majority
vote
manifest.org/20180322/web.archive.org/
web/20180321/eaw.rhizome.org
web.archive.org/web/20180321/eaw.rhizome.org
Caveat 1: If I can hack the rhizome.org page at archive.org, I can probably hack the
fixity info there too, so we really have 4 copies not 5.
web.archive.org/web/20180323/manifest.org/20180322/web.arch
ive.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180323/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
Caveat 2: archive.org and archive-it.org are not independent,
so we really have 3 copies not 5. (yes, this is very similar to textual criticism)
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
No fixity information?
Maybe it’s ok, maybe it’s not.
infowars.com/web/20180321/eaw.rhizome.org
404
404
404
404
404
or perhaps fixity was computed and stored at infowars.com;
you have to decide if you trust that site.
see also: https://www.youtube.com/watch?v=EY15lj-7_lc
http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
Conclusions
• See Melanie Ehrenkranz’s article for good news:
• https://gizmodo.com/how-archivists-could-stop-deepfakes-from-rewriting-hist-1829666009
• I, however, bring mostly bad news:
– The web will be the primary vector for increasingly
sophisticated disinformation
– Web archives can be used to forge or obscure the
provenance of this information
– Vagaries of archive playback means
– naïve fixity approaches will not work
– an archive is not always a reliable witness
– archives are vulnerable to attack from the pages they crawl
– “Fake” archives are easy to set up & proliferate
– Brian Williams (1992) is the OG, not Snoop Dogg (1993)

Contenu connexe

Tendances

Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesMichael Nelson
 
The Secret Revolution (Keene State College)
The Secret Revolution (Keene State College)The Secret Revolution (Keene State College)
The Secret Revolution (Keene State College)Alan Levine
 
Join the Secret Revolution
Join the Secret RevolutionJoin the Secret Revolution
Join the Secret RevolutionAlan Levine
 
Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!hblowers
 
Surfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide PoolsSurfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide Poolshblowers
 
What mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc versionWhat mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc versionAlan Levine
 
WJ Webinar: Community Building
WJ Webinar: Community BuildingWJ Webinar: Community Building
WJ Webinar: Community Buildinghblowers
 
Inno'PLAY'ion
Inno'PLAY'ionInno'PLAY'ion
Inno'PLAY'ionhblowers
 
Libraries to Lifebraries
Libraries to LifebrariesLibraries to Lifebraries
Libraries to Lifebrarieshblowers
 
Echo 08 Presentation
Echo 08 PresentationEcho 08 Presentation
Echo 08 PresentationGraham Wegner
 
Libraries and Transliteracy
Libraries and TransliteracyLibraries and Transliteracy
Libraries and TransliteracyBobbi Newman
 
An Introduction to Open Educational Resources
An Introduction to Open Educational ResourcesAn Introduction to Open Educational Resources
An Introduction to Open Educational ResourcesLeslie Madsen-Brooks
 
Social Software in Education
Social Software in EducationSocial Software in Education
Social Software in EducationLaura Blankenship
 
Transliteracy is 3D
Transliteracy is 3D Transliteracy is 3D
Transliteracy is 3D Bobbi Newman
 

Tendances (20)

Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
The connected life
The connected lifeThe connected life
The connected life
 
The Secret Revolution (Keene State College)
The Secret Revolution (Keene State College)The Secret Revolution (Keene State College)
The Secret Revolution (Keene State College)
 
Join the Secret Revolution
Join the Secret RevolutionJoin the Secret Revolution
Join the Secret Revolution
 
Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!
 
Surfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide PoolsSurfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide Pools
 
What mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc versionWhat mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc version
 
Library 2.0 Creating A Borderless Library
Library 2.0   Creating A Borderless LibraryLibrary 2.0   Creating A Borderless Library
Library 2.0 Creating A Borderless Library
 
WJ Webinar: Community Building
WJ Webinar: Community BuildingWJ Webinar: Community Building
WJ Webinar: Community Building
 
Inno'PLAY'ion
Inno'PLAY'ionInno'PLAY'ion
Inno'PLAY'ion
 
Libraries to Lifebraries
Libraries to LifebrariesLibraries to Lifebraries
Libraries to Lifebraries
 
Echo 08 Presentation
Echo 08 PresentationEcho 08 Presentation
Echo 08 Presentation
 
Libraries and Transliteracy
Libraries and TransliteracyLibraries and Transliteracy
Libraries and Transliteracy
 
Amia Presentation
Amia PresentationAmia Presentation
Amia Presentation
 
An Introduction to Open Educational Resources
An Introduction to Open Educational ResourcesAn Introduction to Open Educational Resources
An Introduction to Open Educational Resources
 
Social Software in Education
Social Software in EducationSocial Software in Education
Social Software in Education
 
Transliteracy is 3D
Transliteracy is 3D Transliteracy is 3D
Transliteracy is 3D
 

Similaire à Va Tech CS Seminar Web Archives Weaponized Evidence

Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Bridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentBridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentAnita Riley
 
MD 400 Introduction
MD 400 IntroductionMD 400 Introduction
MD 400 Introductionjjh3810
 
Feb Institute Virtual Game
Feb Institute Virtual GameFeb Institute Virtual Game
Feb Institute Virtual GameMrSanchez
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)Shawn Jones
 
Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)
Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)
Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)Christian Heller
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Judy O'Connell
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web ArchivesShawn Jones
 
Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012 Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012 Steve Lock
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
 
“Big Brother Is You, Watching.” Week3 Participation Literacy
“Big Brother Is You, Watching.” Week3 Participation Literacy“Big Brother Is You, Watching.” Week3 Participation Literacy
“Big Brother Is You, Watching.” Week3 Participation LiteracyThe New School
 
Willamette digital humanities seminar 2009, part 1
Willamette digital humanities seminar 2009, part 1Willamette digital humanities seminar 2009, part 1
Willamette digital humanities seminar 2009, part 1Bryan Alexander
 
WS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesWS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesMichele Weigle
 
Why the social web is here to stay (and what to do about it)
Why the social web is here to stay (and what to do about it)Why the social web is here to stay (and what to do about it)
Why the social web is here to stay (and what to do about it)Mike Ellis
 
What’s New and Exciting in Library Makerspaces
What’s New and Exciting in Library MakerspacesWhat’s New and Exciting in Library Makerspaces
What’s New and Exciting in Library MakerspacesSt. Petersburg College
 

Similaire à Va Tech CS Seminar Web Archives Weaponized Evidence (20)

Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Bridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentBridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital Environment
 
MD 400 Introduction
MD 400 IntroductionMD 400 Introduction
MD 400 Introduction
 
Feb Institute Virtual Game
Feb Institute Virtual GameFeb Institute Virtual Game
Feb Institute Virtual Game
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
 
Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)
Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)
Supercomputer Earth: The Future of Civilization (& Africa\'s part in it)
 
Web2 0storytelling 2009
Web2 0storytelling 2009Web2 0storytelling 2009
Web2 0storytelling 2009
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Web2 0storytelling 2009
Web2 0storytelling 2009Web2 0storytelling 2009
Web2 0storytelling 2009
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012 Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
“Big Brother Is You, Watching.” Week3 Participation Literacy
“Big Brother Is You, Watching.” Week3 Participation Literacy“Big Brother Is You, Watching.” Week3 Participation Literacy
“Big Brother Is You, Watching.” Week3 Participation Literacy
 
Digital Discipleship
Digital DiscipleshipDigital Discipleship
Digital Discipleship
 
Willamette digital humanities seminar 2009, part 1
Willamette digital humanities seminar 2009, part 1Willamette digital humanities seminar 2009, part 1
Willamette digital humanities seminar 2009, part 1
 
WS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesWS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web Archives
 
Why the social web is here to stay (and what to do about it)
Why the social web is here to stay (and what to do about it)Why the social web is here to stay (and what to do about it)
Why the social web is here to stay (and what to do about it)
 
What’s New and Exciting in Library Makerspaces
What’s New and Exciting in Library MakerspacesWhat’s New and Exciting in Library Makerspaces
What’s New and Exciting in Library Makerspaces
 

Plus de Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?Michael Nelson
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web ArchivesMichael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolMichael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Michael Nelson
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet ArchiveMichael Nelson
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingMichael Nelson
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better Michael Nelson
 

Plus de Michael Nelson (20)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Va Tech CS Seminar Web Archives Weaponized Evidence

  • 1. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein Supported in part by The Andrew Mellon Foundation. Opinions expressed are those of the presenter. based on a 2018-03-23 presentation at the National Forum on Ethics and Archiving the Web
  • 2. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL TL;DR • We are on the cusp of a “Photoshop” moment for synthesizing convincing audio/video • Web archives will be weaponized to: – alter trustworthy content – obfuscate provenance of untrustworthy content web archives https://imgur.com/gallery/akeVeiq
  • 3. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL background: what’s a web archive?
  • 4. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  • 5. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDLhttp://web.archive.org/web/*/http://www.odu.edu/ also: http://whatdiditlooklike.mementoweb.org/tagged/odu.edu
  • 6. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  • 7. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL what was here? we’ll likely never know… (ok, xkcd gives us an idea…)
  • 8. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Sure, go ahead and archive www.odu.edu -- but what about archiving all your Facebook posts, tweets, instagrams, check-ins, etc.?
  • 9. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “Why are they putting all that online?” “And it’s easy to deride this sort of thing as self-absorbed publishing – why would anyone put such drivel out in public? It’s simple. They’re not talking to you. We misread these seemingly inane posts because we’re so unused to seeing written material in public that isn’t intended for us.” Clay Shirky, 2008, p. 85
  • 10. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL We have semi-private discussions in public spaces all the time… https://www.nytimes.com/2017/09/19/us/politics/isnt-that-the-trump-lawyer-a-reporters-accidental-scoop.html https://well.blogs.nytimes.com/2013/06/21/how-the-hum-of-a-coffee-shop-can-boost-creativity/ Even though we know others can eavesdrop – maybe we even want that – if they whipped out their iPhone and started recording us, it might change our behavior. See also: gevulot, agoras, and exomemory in “The Quantum Thief” https://en.wikipedia.org/wiki/The_Quantum_Thief
  • 11. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Given enough time, our private/public performances become art https://archive.org/details/prelingerhomemovies https://genius.com/Dj-shadow-letter-from-home-lyrics https://www.youtube.com/watch?v=MIR62rreRKY https://www.youtube.com/watch?v=fKjg1HfZfPM#t=2m46s https://www.sinecurebooks.com/shop/enjoy-the-experience-bundle/ personally identifiable information!
  • 12. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL as the web archiving community, we constantly are asking ourselves: “Are we creating tools that aid the surveillance state?” Spoiler alert: Yes.
  • 13. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Our attitude about the surveillance state is contextual. https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/ http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says- releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html Boston Marathon Bombing, 2013 https://twitter.com/charliespiering/status/976430395964215296
  • 14. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL The surveillance state also surveils the (other) state. https://www.theguardian.com/uk-news/2018/sep/05/planes-trains-and-fake-names-the-trail-left-by-skripal-suspects “Planes, trains and fake names: the trail left by Skripal suspects” https://www.cnn.com/2018/10/22/middleeast/saudi-operative-jamal-khashoggi-clothes/index.html “Surveillance footage shows Saudi 'body double' in Khashoggi's clothes after he was killed, Turkish source says”
  • 15. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Meanwhile, we happily pay monthly service fees to be surveilled! https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/ https://twitter.com/mtdukes/status/974281625348558848
  • 16. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “Quis custodiet ipsos custodes?” A: Social media. https://twitter.com/WiredUK/status/958084308924760065 https://twitter.com/vicenews/status/670059493581959168
  • 17. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL We don’t feel too bad when we archive accounts that later prove to be trolls / sockpuppets / sybils https://twitter.com/safety_refinery/status/934982022078042112 https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html https://twitter.com/documentnow/status/964882665982722048 https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e https://github.com/fivethirtyeight/russian-troll-tweets/ https://blog.twitter.com/official/en_us/topics/company/2018/enabling-further-research-of-information-operations-on-twitter.html
  • 18. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Nor do we feel bad for holding public figures / organizations accountable https://twitter.com/landlibrarian/status/975910915135754240 https://twitter.com/IEEEhistory/status/960358528987942912 http://archive.is/xh58B
  • 19. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL But our attitude is different when those organizations explicitly monitor us https://twitter.com/pierce/status/980860438119301120https://twitter.com/LSJNews/status/979017806116245504
  • 20. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL We can & should discuss our role in surveillance, but realize Facebook et al. are operating as designed (and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram) https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624 see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
  • 21. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL as the web archiving community, we should be asking ourselves: “Can we authenticate web content?” Spoiler alert: Yes. A bit.
  • 22. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
  • 23. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “We cannot accept this photograph in evidence” http://www.politifact.com/florida/statements/2018/mar/27/ blog-posting/david-hogg-not-school-during-shooting-s-fake-news/ https://twitter.com/acnwala/status/977982456296034304
  • 24. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Granted, we’ve had obvious, cut-n-paste / mashup “evidence” for a long time… Victorian Photo Collage https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage “The Flying Saucer” (1956) https://en.wikipedia.org/wiki/The_Flying_Saucer_(song) https://www.youtube.com/watch?v=XCrn6QXvHLg Brian Williams Raps ‘Gin & Juice’ https://www.youtube.com/watch?v=XlGLhYFrv6w
  • 25. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Crude techniques = humor, sophisticated techniques = deception; Brand’s prediction of “any day now” is now Synthesizing Obama: Learning Lip Sync from Audio SIGGRAPH 2017 https://grail.cs.washington.edu/projects/AudioToObama/ Face2Face: Real-time Face Capture and Reenactment of RGB Videos, CVPR 2016 http://niessnerlab.org/projects/thies2016face.html see also: https://www.youtube.com/watch?v=pkkph4JhrCg
  • 26. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Clumsy, “collage/flying saucer/gin & juice” techniques are already effective on social media We are completely unprepared for advanced, SIGGRAPH/CVPR techniques
  • 27. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “Surely web archives can be used to establish priority and authenticity?” Let’s look at some examples. cf. https://gizmodo.com/how-archivists-could-stop-deepfakes-from-rewriting-hist-1829666009
  • 28. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Neo-Nazis and “Black Panther” Relationship Status: It’s Complicated http://knowyourmeme.com/photos/1338390-black-panther https://twitter.com/TamikaDMallory/status/964701120194019328 See also: https://www.snopes.com/fact-check/mexican-police-caravan-photos/
  • 29. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL nydailynews.com provides screenshots, but not links to the tweets… http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
  • 30. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL @AsianWifeHaver and @DSA_Boi_Pucci are not on the live web… $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/DSA_Boi_Pucci HTTP/1.1 404 Not Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 6329 content-security-policy: [deletia] content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:14:22 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:14:22 GMT
  • 31. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL …nor are they in the Internet Archive note: this exists only because of the redirection to the “suspended” page http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
  • 32. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  • 33. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Can’t find @DSA_Boi_Pucci in any archive Typical archive URI construction: archive.example.org/SomeString/CNN.com/travel web.archive.org/web/*/twitter.com/DSA_Boi_Pucci wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci perma-archives.org/warc/twitter.com/DSA_Boi_Pucci archive.is/twitter.com/DSA_Boi_Pucci www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
  • 34. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL At this point, the absence of evidence means I cannot prove that @DSA_Boi_Pucci: 1) ever existed or 2) was not created by nydailynews.com
  • 35. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL What if we checked these archives? breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci infowars.com/web/*/twitter.com/DSA_Boi_Pucci iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci What if they all said “nydailynews.com test account”? Would you trust the results?
  • 36. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Our entire national web preservation strategy is predicated on Brewster Kahle (IA) “not being evil”™ If he is leading a 20+ year sleeper cell, we’re doomed.
  • 37. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Web archives with international implications: Malaysia Airlines Flight 17 (MH17) http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video http://www.newyorker.com/magazine/2015/01/26/cobweb
  • 38. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  • 39. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL (not really archived as well as we’d like)
  • 40. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Ed and I Discuss Who Has What… https://twitter.com/phonedude_mln/status/490171976389238784
  • 41. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Remember MH17? https://twitter.com/phonedude_mln/status/490171976389238784
  • 42. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Alex is now 404. Would multiple archives have convinced him? https://twitter.com/quicknquiet
  • 43. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Do we really have “a perfect tool to produce `evidence’ of any kind”?
  • 44. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Segal’s Law, restated for web archives: The person with an archive knows what the page looked like. The person with two archives is never sure.
  • 45. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL (apologies to Notorious B.I.G.) “Mo Archives, Mo Problems” Why? Because they’ll rarely agree. Even a single archive is an unreliable witness: zombies, temporal violations, and attacks
  • 46. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Zombies: live web “leaking” into an archived page http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html this page is from 2008 this ad is from 2012 (when this screen shot was taken)
  • 47. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Temporal violations: reconstructing legitimately archived resources into a page that never existed http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html https://ws-dl.blogspot.com/2018/04/2018-04-24-why-we-need-multiple-web.html text (2004-12) says rain, image (2005-09) is clear
  • 48. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Directly attacking the archive (in this case, via orphaned live web resources; “zombie attack”) Lerner, Kohno, Roesner, 2017 https://doi.org/10.1145/3133956.3134042 see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html page is from 2011, iframe content is from 2017 (when screenshot was taken)
  • 49. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Based on feedback from Lerner et al., IA has changed their playback (specifically, with a Content-Security-Policy HTTP response header) But playback remains problematic… (apologies to Peter Arnett) “In order to save the page, we had to completely change it” let’s look at four common scenarios
  • 50. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL 1) JavaScript does not run correctly from the archive http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html https://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html This is cnn.com not replaying; it hasn’t replayed correctly since November 1, 2016
  • 51. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL 2) Archived page doesn’t match live web experience https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html “only a ‘crisis actor’ would tweet in Slovak!” Now imagine she gets fed up, deletes her account, and then someone applies the “abandoned acct / archive” attack Justin Littman described: https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
  • 52. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL (apologies to Heraclitus) 3) You cannot replay twice the same archived page Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/ Animated GIF: https://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html
  • 53. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL 4) Archives are not magic web sites; they have the same problems as regular web sites Archive URIMs With at least two hashes ---------------------------------------------------------------- webharvest.gov 712 712 (100%) archive.is 1396 1364 (97.70%) vefsafn.is 1589 739 (46.50%) archive-it.org 1383 815 (58.92%) stanford.edu 1222 831 (68.00%) internetmemory.org 979 979 (100%) nationalarchives.gov.uk 994 972 (97.78%) archive.bibalex.org 199 177 (88.94%) bac-lac.gc.ca 351 351 (100%) proni.gov.uk 469 129 (27.50%) www.webarchive.org.uk 349 329 (94.26%) www.webcitation.org 1585 828 (52.23%) veebiarhiiv.digar.ee 488 308 (63.11%) webarchive.loc.gov 1594 526 (32.99%) arquivo.pt 1569 1563 (99.61%) web.archive.org 1566 1334 (85.18%) perma-archives.org 182 180 (98.90%) ---------------------------------------------------------------- 16627 12137 (72.99%) Data from 35 downloads over an 11 month period (2017-11 – 2018-10), Mohamed Aturban (in preparation)
  • 54. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL How can we differentiate between “normal” archive playback modification vs. deception? If the tweets or accts are deleted, we don’t know. If I embed fake tweets in another page, then archive that page, only an expert can tell the fake tweets don’t come from twitter.com (& fake archives will lie!) And it is not in Twitter’s (perceived) self-interest to help, cf.: https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/ https://www.vox.com/2018/10/29/18037880/twitter-may-remove-like-button https://www.bloomberg.com/news/articles/2018-10-27/twitter-apologizes-for-ignoring-apparent-threat-in-tweet These might have been swapped -- but how can you tell for sure?
  • 55. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Inserting fakes into real archives Here’s an actual page in the IA “proving” Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg. John Berlin, MS Thesis, 2018 https://www.youtube.com/watch?v=k3QTcJZdFfs (actual URI-R & URI-M have also been obscured in the video to hide the technique) The content is clearly fake, but imagine replacing: 1)“1992” with a more believable “2016”, 2)the fake domain with “bbc.com”, and 3)Brian Williams rapping with a synthesized Trump or Obama speech.
  • 56. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “That will never happen! …right?”
  • 57. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL This isn’t just hypothetical… The opening salvo in what is and isn’t a “deepfake”: https://twitter.com/AaronBlake/status/1035124642456002565https://twitter.com/realDonaldTrump/status/1035120511259500544 https://news.vice.com/en_us/article/ne5x3d/trump-lester-holt-james-comey-nbc
  • 58. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL The May, 2017 NBC interview is not archived until August, 2018 (and even then, the video itself is not archived) https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila https://web.archive.org/web/*/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila https://web.archive.org/web/20180825094239/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila Clicking through to the video reveals a loop of postal carrier slipping on ice; not the Lester Holt interview.
  • 59. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Now convince “Alex” that: 1) the live web nbc.com video has not been modified 2) the IA archive failure is not suspicious / convenient 3) the archived “copy” in infowars.com is not authentic
  • 60. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Blockchain to the rescue!!! <lasers> <sirens> <disco-thumping-soundtrack> nope. https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ https://eprint.iacr.org/2017/375.pdf
  • 61. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Instead, let’s use web archives to monitor web archives.
  • 62. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 1: Push to multiple archives web.archive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180321/eaw.rhizome.org arquivo.pt/wayback/20180321/eaw.rhizome.org archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
  • 63. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 2: Compute fixity, publish fixity “manifest” at a well-known location manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org manifest.org/20180322/archive.is/20180321/eaw.rhizome.org It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that should not change, like JPEGs and certain original HTTP response headers. This example assumes the existence of a well-known server manifest.org. Actual URIs can be a bit more complex using “Trusty URIs”: http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
  • 64. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 3: Wondering about veracity of an archived page? Check manifest.org and recompute fixity. manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org what if manifest.org is down? or possibly hacked? We can’t know archive.org did not alter contents on ingest (20180321), but we can verify that it has not changed since our observation (20180322)
  • 65. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 4: Push manifest to multiple archives web.archive.org/web/20180323/manifest.org/20180322/web.archi ve.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome .org Now the 20180322 version of the manifest of archive.org’s memento of rhizome.org is in four different archives. The URIs are ugly, but the bottom line is an attacker would have to hack a majority of 5 domains (manifest.org + 4 archives) Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
  • 66. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Wondering about veracity of an archived page? Check all copies of manifest.org and take a majority vote manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.org web.archive.org/web/20180321/eaw.rhizome.org Caveat 1: If I can hack the rhizome.org page at archive.org, I can probably hack the fixity info there too, so we really have 4 copies not 5. web.archive.org/web/20180323/manifest.org/20180322/web.arch ive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org Caveat 2: archive.org and archive-it.org are not independent, so we really have 3 copies not 5. (yes, this is very similar to textual criticism)
  • 67. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL No fixity information? Maybe it’s ok, maybe it’s not. infowars.com/web/20180321/eaw.rhizome.org 404 404 404 404 404 or perhaps fixity was computed and stored at infowars.com; you have to decide if you trust that site. see also: https://www.youtube.com/watch?v=EY15lj-7_lc http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  • 68. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Conclusions • See Melanie Ehrenkranz’s article for good news: • https://gizmodo.com/how-archivists-could-stop-deepfakes-from-rewriting-hist-1829666009 • I, however, bring mostly bad news: – The web will be the primary vector for increasingly sophisticated disinformation – Web archives can be used to forge or obscure the provenance of this information – Vagaries of archive playback means – naïve fixity approaches will not work – an archive is not always a reliable witness – archives are vulnerable to attack from the pages they crawl – “Fake” archives are easy to set up & proliferate – Brian Williams (1992) is the OG, not Snoop Dogg (1993)