SlideShare une entreprise Scribd logo
1  sur  61
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Weaponized Web Archives:
Provenance Laundering of Short Order Evidence
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein
Supported in part by The Andrew Mellon Foundation.
Opinions expressed are those of the presenter.
based on a 2018-03-23 presentation at the National Forum on Ethics and Archiving the Web
ODU CS Colloquium, 2018-04-06, @phonedude_mln
TL;DR
• We are on the cusp of a
“Photoshop” moment for
synthesizing convincing
audio/video
• Web archives will be
weaponized to:
– alter trustworthy content
– obfuscate provenance of
untrustworthy content
web archives
https://imgur.com/gallery/akeVeiq
ODU CS Colloquium, 2018-04-06, @phonedude_mln
background:
what’s a web archive?
ODU CS Colloquium, 2018-04-06, @phonedude_mln
ODU CS Colloquium, 2018-04-06, @phonedude_mlnhttp://web.archive.org/web/*/http://www.odu.edu/
also: http://whatdiditlooklike.mementoweb.org/tagged/odu.edu
ODU CS Colloquium, 2018-04-06, @phonedude_mln
ODU CS Colloquium, 2018-04-06, @phonedude_mln
what was here?
we’ll likely never know…
(ok, xkcd gives us an idea…)
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Sure, go ahead and archive
www.odu.edu -- but what about
archiving all your
Facebook posts,
tweets,instagrams,
check-ins, etc.?
ODU CS Colloquium, 2018-04-06, @phonedude_mln
“Why are they putting all that online?”
“And it’s easy to deride this sort of
thing as self-absorbed publishing –
why would anyone put such drivel
out in public?
It’s simple. They’re not talking to you.
We misread these seemingly inane
posts because we’re so unused to
seeing written material in public that
isn’t intended for us.”
Clay Shirky, 2008, p. 85
ODU CS Colloquium, 2018-04-06, @phonedude_mln
We have semi-private discussions in
public spaces all the time…
https://www.nytimes.com/2017/09/19/us/politics/isnt-that-the-trump-lawyer-a-reporters-accidental-scoop.html
https://well.blogs.nytimes.com/2013/06/21/how-the-hum-of-a-coffee-shop-can-boost-creativity/
Even though we know others can eavesdrop – maybe we even want that –
if they whipped out their iPhone and started recording us, it might change our behavior.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
as the web archiving community, we constantly are asking ourselves:
“Are we creating tools that aid the
surveillance state?”
Spoiler alert: Yes.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Our attitude about the
surveillance state is contextual.
https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/
http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says-
releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html
Boston Marathon Bombing, 2013
https://twitter.com/charliespiering/status/976430395964215296
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Given enough time, it becomes art
https://archive.org/details/prelingerhomemovies
https://genius.com/Dj-shadow-letter-from-home-lyrics
https://www.youtube.com/watch?v=MIR62rreRKY
https://www.youtube.com/watch?v=fKjg1HfZfPM#t=2m46s
https://www.sinecurebooks.com/shop/enjoy-the-experience-bundle/
personally
identifiable
information!
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Meanwhile, we happily pay monthly
service fees to be surveilled!
https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/
https://twitter.com/mtdukes/status/974281625348558848
ODU CS Colloquium, 2018-04-06, @phonedude_mln
“Quis custodiet ipsos custodes?”
A: Social media.
https://twitter.com/WIRED/status/958350367468683267 https://twitter.com/vicenews/status/670059493581959168
ODU CS Colloquium, 2018-04-06, @phonedude_mln
We don’t feel too bad when we archive accounts that
later prove to be trolls / sockpuppets / sybils
https://twitter.com/safety_refinery/status/934982022078042112
https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html
https://twitter.com/documentnow/status/964882665982722048
https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Nor do we feel bad for holding
public figures / organizations accountable
https://twitter.com/landlibrarian/status/975910915135754240
https://twitter.com/IEEEhistory/status/960358528987942912
http://archive.is/xh58B
ODU CS Colloquium, 2018-04-06, @phonedude_mln
But our attitude is different when those
organizations explicitly monitor us
https://twitter.com/pierce/status/980860438119301120https://twitter.com/LSJNews/status/979017806116245504
ODU CS Colloquium, 2018-04-06, @phonedude_mln
We can & should discuss our role in surveillance,
but realize Facebook et al. operating as designed
(and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram)
https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624
see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
ODU CS Colloquium, 2018-04-06, @phonedude_mln
as the web archiving community, we should be asking ourselves:
“Can we authenticate web content?”
Spoiler alert: Yes. A bit.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
ODU CS Colloquium, 2018-04-06, @phonedude_mln
“We cannot accept this photograph in evidence”
http://www.politifact.com/florida/statements/2018/mar/27/
blog-posting/david-hogg-not-school-during-shooting-s-fake-news/
https://twitter.com/acnwala/status/977982456296034304
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Granted, we’ve had obvious, cut-n-paste /
mashup “evidence” for a long time…
Victorian Photo Collage
https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage
“The Flying Saucer” (1956)
https://en.wikipedia.org/wiki/The_Flying_Saucer_(song)
https://www.youtube.com/watch?v=XCrn6QXvHLg
Brian Williams Raps ‘Gin & Juice’
https://www.youtube.com/watch?v=XlGLhYFrv6w
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Crude techniques = humor,
sophisticated techniques = deception;
Brand’s prediction of “any day now” is now
Synthesizing Obama: Learning Lip Sync from Audio
SIGGRAPH 2017
https://grail.cs.washington.edu/projects/AudioToObama/
Face2Face: Real-time Face Capture and Reenactment
of RGB Videos, CVPR 2016
http://niessnerlab.org/projects/thies2016face.html
see also: https://www.youtube.com/watch?v=pkkph4JhrCg
ODU CS Colloquium, 2018-04-06, @phonedude_mln
What does this have to do with the web?
Clumsy, “collage/flying saucer/gin & juice” techniques
are already effective on social media
We are completely unprepared for
advanced, SIGGRAPH/CVPR techniques
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Neo-Nazis and “Black Panther”
Relationship Status: It’s Complicated
http://knowyourmeme.com/photos/1338390-black-panther
https://twitter.com/TamikaDMallory/status/964701120194019328
ODU CS Colloquium, 2018-04-06, @phonedude_mln
nydailynews.com provides screenshots,
but not links to the tweets…
http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
ODU CS Colloquium, 2018-04-06, @phonedude_mln
@AsianWifeHaver and @DSA_Boi_Pucci
are not on the live web…
$ curl -I https://twitter.com/AsianWifeHaver
HTTP/1.1 302 Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 103
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:09:27 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:09:27 GMT
location: https://twitter.com/account/suspended
$ curl -I https://twitter.com/AsianWifeHaver
HTTP/1.1 302 Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 103
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:09:27 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:09:27 GMT
location: https://twitter.com/account/suspended
$ curl -I https://twitter.com/DSA_Boi_Pucci
HTTP/1.1 404 Not Found
cache-control: no-cache, no-store, must-revalidate,
pre-check=0, post-check=0
content-length: 6329
content-security-policy: [deletia]
content-type: text/html;charset=utf-8
date: Sat, 17 Mar 2018 22:14:22 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Sat, 17 Mar 2018 22:14:22 GMT
ODU CS Colloquium, 2018-04-06, @phonedude_mln
…nor are they in the Internet Archive
note: this exists only
because of the redirection
to the “suspended” page
http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver
http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
ODU CS Colloquium, 2018-04-06, @phonedude_mln
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Can’t find @DSA_Boi_Pucci in any archive
Typical archive URI construction:
archive.example.org/SomeString/CNN.com/travel
web.archive.org/web/*/twitter.com/DSA_Boi_Pucci
wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci
perma-archives.org/warc/twitter.com/DSA_Boi_Pucci
archive.is/twitter.com/DSA_Boi_Pucci
www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci
wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci
arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci
for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
ODU CS Colloquium, 2018-04-06, @phonedude_mln
What if we checked these archives?
What if they all agreed?
breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci
infowars.com/web/*/twitter.com/DSA_Boi_Pucci
iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci
InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci
Would you trust the results?
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Our entire national digital preservation
strategy is predicated on
Brewster Kahle (IA) “not being evil”™
If he is leading a 20+ year sleeper cell, we’re doomed.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Malaysia Airlines Flight 17 (MH17)
http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info
http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
http://www.newyorker.com/magazine/2015/01/26/cobweb
ODU CS Colloquium, 2018-04-06, @phonedude_mln
ODU CS Colloquium, 2018-04-06, @phonedude_mln
(not really archived as well as we’d like)
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Ed and I Discuss Who Has What…
https://twitter.com/phonedude_mln/status/490171976389238784
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Remember MH17?
https://twitter.com/phonedude_mln/status/490171976389238784
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Alex is now 404.
Would multiple archives have convinced him?
https://twitter.com/quicknquiet
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Do we really have
“a perfect tool to produce `evidence’ of any kind”?
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Segal’s Law, restated for web archives:
The person with an archive knows what the page looked like.
The person with two archives is never sure.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
(apologies to Notorious B.I.G.)
“Mo Archives, Mo Problems”
Why? Because they’ll rarely agree.
Even a single archive is an unreliable witness:
zombies, temporal violations, and attacks
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Zombies: live web “leaking” into an archived page
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
this page is
from 2008
this ad is
from 2012
(when this
screen shot
was taken)
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Temporal violations: reconstructing legitimately
archived resources into a page that never existed
http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
text (2004-12)
says rain,
image (2005-09)
is clear
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Directly attacking the archive
(in this case, via orphaned live web resources; “zombie attack”)
Lerner, Kohno, Roesner, 2017
https://doi.org/10.1145/3133956.3134042
see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html
page is from 2011,
iframe content is from 2017
(when screenshot was taken)
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Based on feedback from Lerner et al.,
IA has changed their playback
(specifically, with a Content-Security-Policy HTTP response header)
But playback remains problematic…
(apologies to Peter Arnett)
“In order to save the page, we had to completely change it”
let’s look at four common scenarios
ODU CS Colloquium, 2018-04-06, @phonedude_mln
1) JavaScript does not run correctly from the archive
http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html
This is cnn.com not replaying;
it hasn’t replayed correctly since
November 1, 2016
ODU CS Colloquium, 2018-04-06, @phonedude_mln
2) Archived page renders differently each time
Mohamed Aturban, unpublished, memento:
http://web.archive.org/web/20130724144801/http://www.cnn.com/
ODU CS Colloquium, 2018-04-06, @phonedude_mln
3) Archive modifies pages that should stay the same –
goodbye conventional fixity checks!
Mohamed Aturban, unpublished, embedding memento:
http://perma-archives.org/warc/20170101182813/http://umich.edu/
http://perma-archives.org/warc/20170101182814id_/http://umich.edu/includes/image/type/gallery/id/113/name/ResearchDIL-19Aug14_DM%28136%29.jpg/width/152/height/152/mode/minfit/
ODU CS Colloquium, 2018-04-06, @phonedude_mln
4) Archived page doesn’t match live web experience
https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change
http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html
“only a ‘crisis actor’ would
tweet in Slovak!”
Now imagine she gets fed up,
deletes her account, and then
someone applies the
“abandoned acct / archive” attack
Justin Littman described:
https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
ODU CS Colloquium, 2018-04-06, @phonedude_mln
How can we differentiate between “normal”
archive modification for playback vs. deception?
These might have been swapped -- but how can you tell for sure?
If the tweets or accts are deleted, we don’t know.
If I embed fake tweets in another page, it’s even more confusing.
And it is not in Twitter’s (perceived) self-interest to help, cf.:
https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/
ODU CS Colloquium, 2018-04-06, @phonedude_mln
You cannot trust the URL in your browser!
Here’s an actual page in the IA “proving”
Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg.
John Berlin, MS Thesis, 2018
https://www.youtube.com/watch?v=k3QTcJZdFfs
(actual URI-R & URI-M have also been faked in video)
The content is clearly fake, but imagine replacing:
1)“1992” with a more believable “2016”,
2)the fake domain with “bbc.com”, and
3)Brian Williams rapping with a synthesized Trump or Obama speech.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Blockchain to the rescue!!!
<lasers>
<sirens>
<disco-thumping-soundtrack>
nope.
https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/
https://eprint.iacr.org/2017/375.pdf
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Instead, let’s use web archives
to monitor web archives.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Step 1: Push to multiple archives
web.archive.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180321/eaw.rhizome.org
archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Step 2: Compute fixity,
publish fixity “manifest” at a well-known location
manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org
manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org
manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org
manifest.org/20180322/archive.is/20180321/eaw.rhizome.org
It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that
should not change, like JPEGs and certain original HTTP response headers.
This example assumes the existence of a well-known server manifest.org.
Actual URIs can be a bit more complex using “Trusty URIs”:
http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Wondering about veracity of an archived page?
Check manifest.org and recompute fixity.
manifest.org/20180322/web.archive.org/
web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org
what if manifest.org is down?
or possibly hacked?
We can’t know archive.org did not alter contents on ingest (20180321),
but we can verify that it has not changed since our observation (20180322)
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Step 4: Push manifest to multiple archives
web.archive.org/web/20180323/manifest.org/20180322/web.archi
ve.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180323/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome
.org
Now the 20180322 version of the manifest of archive.org’s
memento of rhizome.org is in four different archives.
The URIs are ugly, but the bottom line is an attacker would have to hack a
majority of 5 domains (manifest.org + 4 archives)
Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Wondering about veracity of an archived page?
Check all copies of manifest.org and take a majority
vote
manifest.org/20180322/web.archive.org/
web/20180321/eaw.rhizome.org
web.archive.org/web/20180321/eaw.rhizome.org
Caveat 1: If I can hack rhizome.org page at archive.org, I can probably hack the
fixity info there too, so we really have 4 copies not 5.
web.archive.org/web/20180323/manifest.org/20180322/web.arch
ive.org/web/20180321/eaw.rhizome.org
wayback.archive-it.org/all/20180323/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/
web.archive.org/web/20180321/eaw.rhizome.org
Caveat 2: archive.org and archive-it.org are not independent,
so we really have 3 copies not 5. (yes, this is very similar to textual criticism)
ODU CS Colloquium, 2018-04-06, @phonedude_mln
No fixity information?
Maybe it’s ok, maybe it’s not.
infowars.com/web/20180321/eaw.rhizome.org
404
404
404
404
404
or perhaps fixity was computed and stored at freedomfries.org;
you have to decide if you trust that site.
see also: https://www.youtube.com/watch?v=EY15lj-7_lc
http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
ODU CS Colloquium, 2018-04-06, @phonedude_mln
Conclusions
• Bad news:
– The web will be the primary vector for increasingly
sophisticated disinformation
– Web archives can be used to forge or obscure the
provenance of this information
– Brian Williams predates Snoop Dogg
• Good news:
– Web archives have a role in authenticating who said what,
and when
– Contact Dr. Weigle and me if you are interested in privacy,
authenticity, social media, web archiving, etc.

Contenu connexe

Tendances

On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
Matilde Fontanin
 
Fact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and HacksFact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and Hacks
Julian Ausserhofer
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Illuminating  Learning Communities Through School Libraries and MakerspacesC...Illuminating  Learning Communities Through School Libraries and MakerspacesC...
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Buffy Hamilton
 

Tendances (20)

On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
On Fake News, gatekeepers and LIS professionals (Bobcatsss2020)
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
Fact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and HacksFact Checking on Social Media: Tales and Hacks
Fact Checking on Social Media: Tales and Hacks
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Wizard of Apps Revised
Wizard of Apps RevisedWizard of Apps Revised
Wizard of Apps Revised
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 
Tableau Public Overview
Tableau Public OverviewTableau Public Overview
Tableau Public Overview
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Illuminating  Learning Communities Through School Libraries and MakerspacesC...Illuminating  Learning Communities Through School Libraries and MakerspacesC...
Illuminating Learning Communities Through School Libraries and Makerspaces C...
 
Future of highered pub
Future of highered pubFuture of highered pub
Future of highered pub
 
Online 1207
Online 1207Online 1207
Online 1207
 
I know how to search the internet,
I know how to search the internet,I know how to search the internet,
I know how to search the internet,
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
Bridging the divide - a social media workshop
Bridging the divide - a social media workshopBridging the divide - a social media workshop
Bridging the divide - a social media workshop
 
Finding the Phoenix
Finding the PhoenixFinding the Phoenix
Finding the Phoenix
 

Similaire à Weaponized Web Archives: Provenance Laundering of Short Order Evidence

Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
Jie Bao
 

Similaire à Weaponized Web Archives: Provenance Laundering of Short Order Evidence (20)

Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Cloud Creativity
Cloud CreativityCloud Creativity
Cloud Creativity
 
NMC Horizon Report: 2013 Museum Edition Presentation
NMC Horizon Report: 2013 Museum Edition PresentationNMC Horizon Report: 2013 Museum Edition Presentation
NMC Horizon Report: 2013 Museum Edition Presentation
 
QR Codes and Augmented Reality Help Libraries Extend Services
QR Codes and Augmented Reality Help LibrariesExtend Services QR Codes and Augmented Reality Help LibrariesExtend Services
QR Codes and Augmented Reality Help Libraries Extend Services
 
Liveblogging, mobile journalism and verification
Liveblogging, mobile journalism and verificationLiveblogging, mobile journalism and verification
Liveblogging, mobile journalism and verification
 
Cutting Edge Search Technology SAScon May 2012
Cutting Edge Search Technology SAScon May 2012Cutting Edge Search Technology SAScon May 2012
Cutting Edge Search Technology SAScon May 2012
 
Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social Media
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
 
Why Mobile Matters - ASAE Technology Conference
Why Mobile Matters - ASAE Technology ConferenceWhy Mobile Matters - ASAE Technology Conference
Why Mobile Matters - ASAE Technology Conference
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0
 
Bridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentBridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital Environment
 
Know Your Wikis From Your Blogs - CIPR West of England
Know Your Wikis From Your Blogs  - CIPR West of EnglandKnow Your Wikis From Your Blogs  - CIPR West of England
Know Your Wikis From Your Blogs - CIPR West of England
 
20 famous quotes that should help you to think about cyber attacks!
20 famous quotes that should help you to think about cyber attacks!20 famous quotes that should help you to think about cyber attacks!
20 famous quotes that should help you to think about cyber attacks!
 
Skynet vs Mad Max: Battle For The Future #sxsw 2012 #sxbattle
Skynet vs Mad Max: Battle For The Future #sxsw 2012 #sxbattleSkynet vs Mad Max: Battle For The Future #sxsw 2012 #sxbattle
Skynet vs Mad Max: Battle For The Future #sxsw 2012 #sxbattle
 
Evaluating Social Media Reach via Mainstream Media Discourse
Evaluating Social Media Reach via Mainstream Media DiscourseEvaluating Social Media Reach via Mainstream Media Discourse
Evaluating Social Media Reach via Mainstream Media Discourse
 
Diversity (in Media)
Diversity (in Media)Diversity (in Media)
Diversity (in Media)
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 

Plus de Michael Nelson

Plus de Michael Nelson (17)

Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Dernier

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Dernier (20)

High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

  • 1. ODU CS Colloquium, 2018-04-06, @phonedude_mln Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein Supported in part by The Andrew Mellon Foundation. Opinions expressed are those of the presenter. based on a 2018-03-23 presentation at the National Forum on Ethics and Archiving the Web
  • 2. ODU CS Colloquium, 2018-04-06, @phonedude_mln TL;DR • We are on the cusp of a “Photoshop” moment for synthesizing convincing audio/video • Web archives will be weaponized to: – alter trustworthy content – obfuscate provenance of untrustworthy content web archives https://imgur.com/gallery/akeVeiq
  • 3. ODU CS Colloquium, 2018-04-06, @phonedude_mln background: what’s a web archive?
  • 4. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  • 5. ODU CS Colloquium, 2018-04-06, @phonedude_mlnhttp://web.archive.org/web/*/http://www.odu.edu/ also: http://whatdiditlooklike.mementoweb.org/tagged/odu.edu
  • 6. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  • 7. ODU CS Colloquium, 2018-04-06, @phonedude_mln what was here? we’ll likely never know… (ok, xkcd gives us an idea…)
  • 8. ODU CS Colloquium, 2018-04-06, @phonedude_mln Sure, go ahead and archive www.odu.edu -- but what about archiving all your Facebook posts, tweets,instagrams, check-ins, etc.?
  • 9. ODU CS Colloquium, 2018-04-06, @phonedude_mln “Why are they putting all that online?” “And it’s easy to deride this sort of thing as self-absorbed publishing – why would anyone put such drivel out in public? It’s simple. They’re not talking to you. We misread these seemingly inane posts because we’re so unused to seeing written material in public that isn’t intended for us.” Clay Shirky, 2008, p. 85
  • 10. ODU CS Colloquium, 2018-04-06, @phonedude_mln We have semi-private discussions in public spaces all the time… https://www.nytimes.com/2017/09/19/us/politics/isnt-that-the-trump-lawyer-a-reporters-accidental-scoop.html https://well.blogs.nytimes.com/2013/06/21/how-the-hum-of-a-coffee-shop-can-boost-creativity/ Even though we know others can eavesdrop – maybe we even want that – if they whipped out their iPhone and started recording us, it might change our behavior.
  • 11. ODU CS Colloquium, 2018-04-06, @phonedude_mln as the web archiving community, we constantly are asking ourselves: “Are we creating tools that aid the surveillance state?” Spoiler alert: Yes.
  • 12. ODU CS Colloquium, 2018-04-06, @phonedude_mln Our attitude about the surveillance state is contextual. https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/ http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says- releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html Boston Marathon Bombing, 2013 https://twitter.com/charliespiering/status/976430395964215296
  • 13. ODU CS Colloquium, 2018-04-06, @phonedude_mln Given enough time, it becomes art https://archive.org/details/prelingerhomemovies https://genius.com/Dj-shadow-letter-from-home-lyrics https://www.youtube.com/watch?v=MIR62rreRKY https://www.youtube.com/watch?v=fKjg1HfZfPM#t=2m46s https://www.sinecurebooks.com/shop/enjoy-the-experience-bundle/ personally identifiable information!
  • 14. ODU CS Colloquium, 2018-04-06, @phonedude_mln Meanwhile, we happily pay monthly service fees to be surveilled! https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/ https://twitter.com/mtdukes/status/974281625348558848
  • 15. ODU CS Colloquium, 2018-04-06, @phonedude_mln “Quis custodiet ipsos custodes?” A: Social media. https://twitter.com/WIRED/status/958350367468683267 https://twitter.com/vicenews/status/670059493581959168
  • 16. ODU CS Colloquium, 2018-04-06, @phonedude_mln We don’t feel too bad when we archive accounts that later prove to be trolls / sockpuppets / sybils https://twitter.com/safety_refinery/status/934982022078042112 https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html https://twitter.com/documentnow/status/964882665982722048 https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e
  • 17. ODU CS Colloquium, 2018-04-06, @phonedude_mln Nor do we feel bad for holding public figures / organizations accountable https://twitter.com/landlibrarian/status/975910915135754240 https://twitter.com/IEEEhistory/status/960358528987942912 http://archive.is/xh58B
  • 18. ODU CS Colloquium, 2018-04-06, @phonedude_mln But our attitude is different when those organizations explicitly monitor us https://twitter.com/pierce/status/980860438119301120https://twitter.com/LSJNews/status/979017806116245504
  • 19. ODU CS Colloquium, 2018-04-06, @phonedude_mln We can & should discuss our role in surveillance, but realize Facebook et al. operating as designed (and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram) https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624 see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
  • 20. ODU CS Colloquium, 2018-04-06, @phonedude_mln as the web archiving community, we should be asking ourselves: “Can we authenticate web content?” Spoiler alert: Yes. A bit.
  • 21. ODU CS Colloquium, 2018-04-06, @phonedude_mln Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
  • 22. ODU CS Colloquium, 2018-04-06, @phonedude_mln “We cannot accept this photograph in evidence” http://www.politifact.com/florida/statements/2018/mar/27/ blog-posting/david-hogg-not-school-during-shooting-s-fake-news/ https://twitter.com/acnwala/status/977982456296034304
  • 23. ODU CS Colloquium, 2018-04-06, @phonedude_mln Granted, we’ve had obvious, cut-n-paste / mashup “evidence” for a long time… Victorian Photo Collage https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage “The Flying Saucer” (1956) https://en.wikipedia.org/wiki/The_Flying_Saucer_(song) https://www.youtube.com/watch?v=XCrn6QXvHLg Brian Williams Raps ‘Gin & Juice’ https://www.youtube.com/watch?v=XlGLhYFrv6w
  • 24. ODU CS Colloquium, 2018-04-06, @phonedude_mln Crude techniques = humor, sophisticated techniques = deception; Brand’s prediction of “any day now” is now Synthesizing Obama: Learning Lip Sync from Audio SIGGRAPH 2017 https://grail.cs.washington.edu/projects/AudioToObama/ Face2Face: Real-time Face Capture and Reenactment of RGB Videos, CVPR 2016 http://niessnerlab.org/projects/thies2016face.html see also: https://www.youtube.com/watch?v=pkkph4JhrCg
  • 25. ODU CS Colloquium, 2018-04-06, @phonedude_mln What does this have to do with the web? Clumsy, “collage/flying saucer/gin & juice” techniques are already effective on social media We are completely unprepared for advanced, SIGGRAPH/CVPR techniques
  • 26. ODU CS Colloquium, 2018-04-06, @phonedude_mln Neo-Nazis and “Black Panther” Relationship Status: It’s Complicated http://knowyourmeme.com/photos/1338390-black-panther https://twitter.com/TamikaDMallory/status/964701120194019328
  • 27. ODU CS Colloquium, 2018-04-06, @phonedude_mln nydailynews.com provides screenshots, but not links to the tweets… http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
  • 28. ODU CS Colloquium, 2018-04-06, @phonedude_mln @AsianWifeHaver and @DSA_Boi_Pucci are not on the live web… $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/DSA_Boi_Pucci HTTP/1.1 404 Not Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 6329 content-security-policy: [deletia] content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:14:22 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:14:22 GMT
  • 29. ODU CS Colloquium, 2018-04-06, @phonedude_mln …nor are they in the Internet Archive note: this exists only because of the redirection to the “suspended” page http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
  • 30. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  • 31. ODU CS Colloquium, 2018-04-06, @phonedude_mln Can’t find @DSA_Boi_Pucci in any archive Typical archive URI construction: archive.example.org/SomeString/CNN.com/travel web.archive.org/web/*/twitter.com/DSA_Boi_Pucci wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci perma-archives.org/warc/twitter.com/DSA_Boi_Pucci archive.is/twitter.com/DSA_Boi_Pucci www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
  • 32. ODU CS Colloquium, 2018-04-06, @phonedude_mln What if we checked these archives? What if they all agreed? breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci infowars.com/web/*/twitter.com/DSA_Boi_Pucci iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci Would you trust the results?
  • 33. ODU CS Colloquium, 2018-04-06, @phonedude_mln Our entire national digital preservation strategy is predicated on Brewster Kahle (IA) “not being evil”™ If he is leading a 20+ year sleeper cell, we’re doomed.
  • 34. ODU CS Colloquium, 2018-04-06, @phonedude_mln Malaysia Airlines Flight 17 (MH17) http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video http://www.newyorker.com/magazine/2015/01/26/cobweb
  • 35. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  • 36. ODU CS Colloquium, 2018-04-06, @phonedude_mln (not really archived as well as we’d like)
  • 37. ODU CS Colloquium, 2018-04-06, @phonedude_mln Ed and I Discuss Who Has What… https://twitter.com/phonedude_mln/status/490171976389238784
  • 38. ODU CS Colloquium, 2018-04-06, @phonedude_mln Remember MH17? https://twitter.com/phonedude_mln/status/490171976389238784
  • 39. ODU CS Colloquium, 2018-04-06, @phonedude_mln Alex is now 404. Would multiple archives have convinced him? https://twitter.com/quicknquiet
  • 40. ODU CS Colloquium, 2018-04-06, @phonedude_mln Do we really have “a perfect tool to produce `evidence’ of any kind”?
  • 41. ODU CS Colloquium, 2018-04-06, @phonedude_mln Segal’s Law, restated for web archives: The person with an archive knows what the page looked like. The person with two archives is never sure.
  • 42. ODU CS Colloquium, 2018-04-06, @phonedude_mln (apologies to Notorious B.I.G.) “Mo Archives, Mo Problems” Why? Because they’ll rarely agree. Even a single archive is an unreliable witness: zombies, temporal violations, and attacks
  • 43. ODU CS Colloquium, 2018-04-06, @phonedude_mln Zombies: live web “leaking” into an archived page http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html this page is from 2008 this ad is from 2012 (when this screen shot was taken)
  • 44. ODU CS Colloquium, 2018-04-06, @phonedude_mln Temporal violations: reconstructing legitimately archived resources into a page that never existed http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html text (2004-12) says rain, image (2005-09) is clear
  • 45. ODU CS Colloquium, 2018-04-06, @phonedude_mln Directly attacking the archive (in this case, via orphaned live web resources; “zombie attack”) Lerner, Kohno, Roesner, 2017 https://doi.org/10.1145/3133956.3134042 see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html page is from 2011, iframe content is from 2017 (when screenshot was taken)
  • 46. ODU CS Colloquium, 2018-04-06, @phonedude_mln Based on feedback from Lerner et al., IA has changed their playback (specifically, with a Content-Security-Policy HTTP response header) But playback remains problematic… (apologies to Peter Arnett) “In order to save the page, we had to completely change it” let’s look at four common scenarios
  • 47. ODU CS Colloquium, 2018-04-06, @phonedude_mln 1) JavaScript does not run correctly from the archive http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html This is cnn.com not replaying; it hasn’t replayed correctly since November 1, 2016
  • 48. ODU CS Colloquium, 2018-04-06, @phonedude_mln 2) Archived page renders differently each time Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/
  • 49. ODU CS Colloquium, 2018-04-06, @phonedude_mln 3) Archive modifies pages that should stay the same – goodbye conventional fixity checks! Mohamed Aturban, unpublished, embedding memento: http://perma-archives.org/warc/20170101182813/http://umich.edu/ http://perma-archives.org/warc/20170101182814id_/http://umich.edu/includes/image/type/gallery/id/113/name/ResearchDIL-19Aug14_DM%28136%29.jpg/width/152/height/152/mode/minfit/
  • 50. ODU CS Colloquium, 2018-04-06, @phonedude_mln 4) Archived page doesn’t match live web experience https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html “only a ‘crisis actor’ would tweet in Slovak!” Now imagine she gets fed up, deletes her account, and then someone applies the “abandoned acct / archive” attack Justin Littman described: https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
  • 51. ODU CS Colloquium, 2018-04-06, @phonedude_mln How can we differentiate between “normal” archive modification for playback vs. deception? These might have been swapped -- but how can you tell for sure? If the tweets or accts are deleted, we don’t know. If I embed fake tweets in another page, it’s even more confusing. And it is not in Twitter’s (perceived) self-interest to help, cf.: https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/
  • 52. ODU CS Colloquium, 2018-04-06, @phonedude_mln You cannot trust the URL in your browser! Here’s an actual page in the IA “proving” Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg. John Berlin, MS Thesis, 2018 https://www.youtube.com/watch?v=k3QTcJZdFfs (actual URI-R & URI-M have also been faked in video) The content is clearly fake, but imagine replacing: 1)“1992” with a more believable “2016”, 2)the fake domain with “bbc.com”, and 3)Brian Williams rapping with a synthesized Trump or Obama speech.
  • 53. ODU CS Colloquium, 2018-04-06, @phonedude_mln Blockchain to the rescue!!! <lasers> <sirens> <disco-thumping-soundtrack> nope. https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ https://eprint.iacr.org/2017/375.pdf
  • 54. ODU CS Colloquium, 2018-04-06, @phonedude_mln Instead, let’s use web archives to monitor web archives.
  • 55. ODU CS Colloquium, 2018-04-06, @phonedude_mln Step 1: Push to multiple archives web.archive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180321/eaw.rhizome.org arquivo.pt/wayback/20180321/eaw.rhizome.org archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
  • 56. ODU CS Colloquium, 2018-04-06, @phonedude_mln Step 2: Compute fixity, publish fixity “manifest” at a well-known location manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org manifest.org/20180322/archive.is/20180321/eaw.rhizome.org It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that should not change, like JPEGs and certain original HTTP response headers. This example assumes the existence of a well-known server manifest.org. Actual URIs can be a bit more complex using “Trusty URIs”: http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
  • 57. ODU CS Colloquium, 2018-04-06, @phonedude_mln Wondering about veracity of an archived page? Check manifest.org and recompute fixity. manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org what if manifest.org is down? or possibly hacked? We can’t know archive.org did not alter contents on ingest (20180321), but we can verify that it has not changed since our observation (20180322)
  • 58. ODU CS Colloquium, 2018-04-06, @phonedude_mln Step 4: Push manifest to multiple archives web.archive.org/web/20180323/manifest.org/20180322/web.archi ve.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome .org Now the 20180322 version of the manifest of archive.org’s memento of rhizome.org is in four different archives. The URIs are ugly, but the bottom line is an attacker would have to hack a majority of 5 domains (manifest.org + 4 archives) Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
  • 59. ODU CS Colloquium, 2018-04-06, @phonedude_mln Wondering about veracity of an archived page? Check all copies of manifest.org and take a majority vote manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.org web.archive.org/web/20180321/eaw.rhizome.org Caveat 1: If I can hack rhizome.org page at archive.org, I can probably hack the fixity info there too, so we really have 4 copies not 5. web.archive.org/web/20180323/manifest.org/20180322/web.arch ive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org Caveat 2: archive.org and archive-it.org are not independent, so we really have 3 copies not 5. (yes, this is very similar to textual criticism)
  • 60. ODU CS Colloquium, 2018-04-06, @phonedude_mln No fixity information? Maybe it’s ok, maybe it’s not. infowars.com/web/20180321/eaw.rhizome.org 404 404 404 404 404 or perhaps fixity was computed and stored at freedomfries.org; you have to decide if you trust that site. see also: https://www.youtube.com/watch?v=EY15lj-7_lc http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  • 61. ODU CS Colloquium, 2018-04-06, @phonedude_mln Conclusions • Bad news: – The web will be the primary vector for increasingly sophisticated disinformation – Web archives can be used to forge or obscure the provenance of this information – Brian Williams predates Snoop Dogg • Good news: – Web archives have a role in authenticating who said what, and when – Contact Dr. Weigle and me if you are interested in privacy, authenticity, social media, web archiving, etc.