Sn@tch CNI Fall 2014

Sn@tch:
An Archiving and Analysis
Service for Global News
Todd Grappone @liber8er
Sharon Farb @farbthink
Martin Klein @mart1nkle1n
Peter Broadwell @peterbroadwell

Digital ephemera
collections
• Collected by researchers
• Donated by activists
• Include images, audio,
video, scanned
documents, social media,
server logs

International Collecting
• 829 digitally recorded Iranian dissident news programs
• 9,166 other videos from the Iranian Green Movement
• 29,441 digital photographs from the Green Movement
• 543 documents from Tahrir Square

News and Perspectives
The UCLA NewsScape:
• >228,000 hours of TV news
• Recorded 2005-present
• 13 countries, 9 languages
• 38 networks
• Searchable by captions, on-
screen text, named entities
• How to incorporate social media
into this variety of perspectives?

A Brief History of Timeliness
• Twitter archive at the Library of Congress [1]
• Last public update from January 4th 2013
• ~170 billion tweets, > 130 TB compressed (late 2012)
• Single search against 2006-2010 data may take up to 24 hours
• Twitter data access at Massachusetts Institute of Technology,
Laboratory for Social Machines [2]
• Public announcement from October 1st 2014
[1] http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-archive-at-the-library-of-congress/
[2] https://blog.twitter.com/2014/investing-in-mit-s-new-laboratory-for-social-machines

In case you missed it:
• Twitter makes full archive
of tweets available,
indexed
• Great, problem solved?
• How about deleted
tweets?
• Real-time capture of
embedded resources?
https://blog.twitter.com/2014/building-a-complete-tweet-index

• Many initiatives to capture Twitter data
• Live, after an event, both
• Mostly ad-hoc efforts, rarely institutionalized
• Operation often requires programming or sys admin skills
• Deen Freelon’s (American University) incomplete list of tools:
https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lY
wctj6ek6ryqDOiQ/

Social Feed Manager (Dan Chudnov, GWU); as presented at
#cni13f
http://social-feed-manager.readthedocs.org/

twarc (Ed Summers, MITH); used for Ferguson
data
http://inkdroid.org/journal/2014/08/30/a-ferguson-twitter-archive/http://files.archivists.org/conference/nola2013/twitter/twarc-saa13.htm

We Can
Remember It for
You Wholesale
I. Real-time capture of
tweets plus pro-active
archiving of embedded
resources
II. Rapid analysis, real-
time opportunities
III. Collection-agnostic
linking

Remembrance of Tweets/Links Past
• Utilize GWU’s Social Feed Manager
• Filter by keywords, user handles, location, time, etc
• Store raw tweets
• Extract and archive embedded URIs
• Utilize pro-active archiving solutions: Internet Archive,
archive.today

• UCLA’ s dataset about Egyptian revolution
• More than 400k tweets
• Approx. 50k unique users
• Tweets originated from within 200 miles around Cairo

• 25% of tweets contain references to external resources
(web pages, images, videos, etc)

http://bit.ly/dTjCUd
HTTP 200 OK

• 20% of references are dead, after less than 4 years (!!!)

http://yfrog.com/h02gvclj
HTTP GET
 200 OK
 HTTP HEAD
 204 No Content

• 20% of references are dead AND
• 60% of these are not archived

http://wayback.archive-it.org/all/20110203083908/http://yfrog.com/h02gvclj
This one
is!
discovered via
#memento

URIs from Ed Summer’s Ferguson
dataset
https://edsu.github.io/ferguson-urls/
pink == not archived
(Internet Archive)
28%

http://babylon.library.ucla.edu/mklein/archived.html

Part 2: Rapid, Adaptive
Analysis
https://srogers.cartodb.com/viz/64f6c0f4-745d-11e4-
b4e1-0e4fddd5de28/public_map

Part 2: Rapid, Adaptive
Analysis

Part 3: Collection-Agnostic Linking

Part 3: Collection-Agnostic Linking
On TV news: Egypt, Tahrir, Cairo
On Twitter: #jan25, #tahrir, #egypt

Raiders of the Lost Links
Challenges and opportunities:
• Legal frameworks for sharing and preserving tweets and linked
resources
• Collaborations and partnerships to ensure momentum, sustainability
• Expansion to other forms of (social) media

Lazy Digital Archivists: Your Time is Up
Todd Grappone grappone@library.ucla.edu
Sharon Farb farb@library.ucla.edu
Martin Klein martinklein@library.ucla.edu
Peter
Broadwell
broadwell@library.ucla.edu

Sn@tch CNI Fall 2014

Recommended

Recommended

More Related Content

Similar to Sn@tch CNI Fall 2014

Similar to Sn@tch CNI Fall 2014 (20)

More from Martin Klein

More from Martin Klein (20)

Recently uploaded

Recently uploaded (20)

Sn@tch CNI Fall 2014

Editor's Notes