SlideShare a Scribd company logo
1 of 28
Sn@tch:
An Archiving and Analysis
Service for Global News
Todd Grappone @liber8er
Sharon Farb @farbthink
Martin Klein @mart1nkle1n
Peter Broadwell @peterbroadwell
Digital ephemera
collections
• Collected by researchers
• Donated by activists
• Include images, audio,
video, scanned
documents, social media,
server logs
International Collecting
• 829 digitally recorded Iranian dissident news programs
• 9,166 other videos from the Iranian Green Movement
• 29,441 digital photographs from the Green Movement
• 543 documents from Tahrir Square
News and Perspectives
The UCLA NewsScape:
• >228,000 hours of TV news
• Recorded 2005-present
• 13 countries, 9 languages
• 38 networks
• Searchable by captions, on-
screen text, named entities
• How to incorporate social media
into this variety of perspectives?
Social Local Global
A Brief History of Timeliness
• Twitter archive at the Library of Congress [1]
• Last public update from January 4th 2013
• ~170 billion tweets, > 130 TB compressed (late 2012)
• Single search against 2006-2010 data may take up to 24 hours
• Twitter data access at Massachusetts Institute of Technology,
Laboratory for Social Machines [2]
• Public announcement from October 1st 2014
[1] http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-archive-at-the-library-of-congress/
[2] https://blog.twitter.com/2014/investing-in-mit-s-new-laboratory-for-social-machines
A Brief History of Timeliness
In case you missed it:
• Twitter makes full archive
of tweets available,
indexed
• Great, problem solved?
• How about deleted
tweets?
• Real-time capture of
embedded resources?
https://blog.twitter.com/2014/building-a-complete-tweet-index
A Brief History of Timeliness
• Many initiatives to capture Twitter data
• Live, after an event, both
• Mostly ad-hoc efforts, rarely institutionalized
• Operation often requires programming or sys admin skills
• Deen Freelon’s (American University) incomplete list of tools:
https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lY
wctj6ek6ryqDOiQ/
A Brief History of Timeliness
Social Feed Manager (Dan Chudnov, GWU); as presented at
#cni13f
http://social-feed-manager.readthedocs.org/
A Brief History of Timeliness
twarc (Ed Summers, MITH); used for Ferguson
data
http://inkdroid.org/journal/2014/08/30/a-ferguson-twitter-archive/http://files.archivists.org/conference/nola2013/twitter/twarc-saa13.htm
We Can
Remember It for
You Wholesale
I. Real-time capture of
tweets plus pro-active
archiving of embedded
resources
II. Rapid analysis, real-
time opportunities
III. Collection-agnostic
linking
Remembrance of Tweets/Links Past
• Utilize GWU’s Social Feed Manager
• Filter by keywords, user handles, location, time, etc
• Store raw tweets
• Extract and archive embedded URIs
• Utilize pro-active archiving solutions: Internet Archive,
archive.today
Remembrance of Tweets/Links Past
• UCLA’ s dataset about Egyptian revolution
• More than 400k tweets
• Approx. 50k unique users
• Tweets originated from within 200 miles around Cairo
Remembrance of Tweets/Links Past
• UCLA’ s dataset about Egyptian revolution
• 25% of tweets contain references to external resources
(web pages, images, videos, etc)
Remembrance of Tweets/Links Past
http://bit.ly/dTjCUd
HTTP 200 OK
Remembrance of Tweets/Links Past
• UCLA’ s dataset about Egyptian revolution
• 20% of references are dead, after less than 4 years (!!!)
Remembrance of Tweets/Links Past
http://yfrog.com/h02gvclj
HTTP GET
 200 OK
 HTTP HEAD
 204 No Content
Remembrance of Tweets/Links Past
• UCLA’ s dataset about Egyptian revolution
• 20% of references are dead AND
• 60% of these are not archived
http://wayback.archive-it.org/all/20110203083908/http://yfrog.com/h02gvclj
This one
is!
discovered via
#memento
Remembrance of Tweets/Links Past
URIs from Ed Summer’s Ferguson
dataset
https://edsu.github.io/ferguson-urls/
pink == not archived
(Internet Archive)
28%
Remembrance of Tweets/Links Past
http://babylon.library.ucla.edu/mklein/archived.html
Part 2: Rapid, Adaptive
Analysis
https://srogers.cartodb.com/viz/64f6c0f4-745d-11e4-
b4e1-0e4fddd5de28/public_map
Part 2: Rapid, Adaptive
Analysis
Part 3: Collection-Agnostic Linking
Part 3: Collection-Agnostic Linking
On TV news: Egypt, Tahrir, Cairo
On Twitter: #jan25, #tahrir, #egypt
Part 3: Collection-Agnostic Linking
Raiders of the Lost Links
Challenges and opportunities:
• Legal frameworks for sharing and preserving tweets and linked
resources
• Collaborations and partnerships to ensure momentum, sustainability
• Expansion to other forms of (social) media
Lazy Digital Archivists: Your Time is Up
Todd Grappone grappone@library.ucla.edu
Sharon Farb farb@library.ucla.edu
Martin Klein martinklein@library.ucla.edu
Peter
Broadwell
broadwell@library.ucla.edu

More Related Content

Similar to Sn@tch CNI Fall 2014

Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
lljohnston
 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BL
Aquiles Alencar Brayner
 
Sanjana slcj - new media
Sanjana   slcj - new mediaSanjana   slcj - new media
Sanjana slcj - new media
showslides
 

Similar to Sn@tch CNI Fall 2014 (20)

Archiving The Social Media Presence of The River-side
Archiving The Social Media Presence of The River-sideArchiving The Social Media Presence of The River-side
Archiving The Social Media Presence of The River-side
 
Media Ecology Project slides from Open Repositories 2015
Media Ecology Project slides from Open Repositories 2015Media Ecology Project slides from Open Repositories 2015
Media Ecology Project slides from Open Repositories 2015
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
 
Digital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesDigital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issues
 
Digital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issuesDigital contemporary history: sources, tools, methods, issues
Digital contemporary history: sources, tools, methods, issues
 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BL
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
 
Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014
 
Linked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesLinked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media Archives
 
Twitter as a First Draft of the Present – and the Challenges of Preserving It...
Twitter as a First Draft of the Present – and the Challenges of Preserving It...Twitter as a First Draft of the Present – and the Challenges of Preserving It...
Twitter as a First Draft of the Present – and the Challenges of Preserving It...
 
Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter users
 
Social Media
Social MediaSocial Media
Social Media
 
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the MakingKeeping the Broadcast Historic Record: An Archive of Public Media in the Making
Keeping the Broadcast Historic Record: An Archive of Public Media in the Making
 
AL Live—Libraries and COVID-19: Considering Copyright During a Crisis, Part 2...
AL Live—Libraries and COVID-19: Considering Copyright During a Crisis, Part 2...AL Live—Libraries and COVID-19: Considering Copyright During a Crisis, Part 2...
AL Live—Libraries and COVID-19: Considering Copyright During a Crisis, Part 2...
 
Social Media Application Development
Social Media Application DevelopmentSocial Media Application Development
Social Media Application Development
 
New media: What really is new?
New media: What really is new?New media: What really is new?
New media: What really is new?
 
New media
New mediaNew media
New media
 
Sanjana slcj - new media
Sanjana   slcj - new mediaSanjana   slcj - new media
Sanjana slcj - new media
 
Sanjana slcj - new media
Sanjana   slcj - new mediaSanjana   slcj - new media
Sanjana slcj - new media
 
Aquiles imlr seminar
Aquiles imlr seminarAquiles imlr seminar
Aquiles imlr seminar
 

More from Martin Klein

More from Martin Klein (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 

Recently uploaded

Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 

Recently uploaded (20)

20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 

Sn@tch CNI Fall 2014

  • 1. Sn@tch: An Archiving and Analysis Service for Global News Todd Grappone @liber8er Sharon Farb @farbthink Martin Klein @mart1nkle1n Peter Broadwell @peterbroadwell
  • 2. Digital ephemera collections • Collected by researchers • Donated by activists • Include images, audio, video, scanned documents, social media, server logs
  • 3. International Collecting • 829 digitally recorded Iranian dissident news programs • 9,166 other videos from the Iranian Green Movement • 29,441 digital photographs from the Green Movement • 543 documents from Tahrir Square
  • 4. News and Perspectives The UCLA NewsScape: • >228,000 hours of TV news • Recorded 2005-present • 13 countries, 9 languages • 38 networks • Searchable by captions, on- screen text, named entities • How to incorporate social media into this variety of perspectives?
  • 6. A Brief History of Timeliness • Twitter archive at the Library of Congress [1] • Last public update from January 4th 2013 • ~170 billion tweets, > 130 TB compressed (late 2012) • Single search against 2006-2010 data may take up to 24 hours • Twitter data access at Massachusetts Institute of Technology, Laboratory for Social Machines [2] • Public announcement from October 1st 2014 [1] http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-archive-at-the-library-of-congress/ [2] https://blog.twitter.com/2014/investing-in-mit-s-new-laboratory-for-social-machines
  • 7. A Brief History of Timeliness In case you missed it: • Twitter makes full archive of tweets available, indexed • Great, problem solved? • How about deleted tweets? • Real-time capture of embedded resources? https://blog.twitter.com/2014/building-a-complete-tweet-index
  • 8. A Brief History of Timeliness • Many initiatives to capture Twitter data • Live, after an event, both • Mostly ad-hoc efforts, rarely institutionalized • Operation often requires programming or sys admin skills • Deen Freelon’s (American University) incomplete list of tools: https://docs.google.com/document/d/1UaERzROI986HqcwrBDLaqGG8X_lY wctj6ek6ryqDOiQ/
  • 9. A Brief History of Timeliness Social Feed Manager (Dan Chudnov, GWU); as presented at #cni13f http://social-feed-manager.readthedocs.org/
  • 10. A Brief History of Timeliness twarc (Ed Summers, MITH); used for Ferguson data http://inkdroid.org/journal/2014/08/30/a-ferguson-twitter-archive/http://files.archivists.org/conference/nola2013/twitter/twarc-saa13.htm
  • 11. We Can Remember It for You Wholesale I. Real-time capture of tweets plus pro-active archiving of embedded resources II. Rapid analysis, real- time opportunities III. Collection-agnostic linking
  • 12. Remembrance of Tweets/Links Past • Utilize GWU’s Social Feed Manager • Filter by keywords, user handles, location, time, etc • Store raw tweets • Extract and archive embedded URIs • Utilize pro-active archiving solutions: Internet Archive, archive.today
  • 13. Remembrance of Tweets/Links Past • UCLA’ s dataset about Egyptian revolution • More than 400k tweets • Approx. 50k unique users • Tweets originated from within 200 miles around Cairo
  • 14. Remembrance of Tweets/Links Past • UCLA’ s dataset about Egyptian revolution • 25% of tweets contain references to external resources (web pages, images, videos, etc)
  • 15. Remembrance of Tweets/Links Past http://bit.ly/dTjCUd HTTP 200 OK
  • 16. Remembrance of Tweets/Links Past • UCLA’ s dataset about Egyptian revolution • 20% of references are dead, after less than 4 years (!!!)
  • 17. Remembrance of Tweets/Links Past http://yfrog.com/h02gvclj HTTP GET  200 OK  HTTP HEAD  204 No Content
  • 18. Remembrance of Tweets/Links Past • UCLA’ s dataset about Egyptian revolution • 20% of references are dead AND • 60% of these are not archived
  • 20. Remembrance of Tweets/Links Past URIs from Ed Summer’s Ferguson dataset https://edsu.github.io/ferguson-urls/ pink == not archived (Internet Archive) 28%
  • 21. Remembrance of Tweets/Links Past http://babylon.library.ucla.edu/mklein/archived.html
  • 22. Part 2: Rapid, Adaptive Analysis https://srogers.cartodb.com/viz/64f6c0f4-745d-11e4- b4e1-0e4fddd5de28/public_map
  • 23. Part 2: Rapid, Adaptive Analysis
  • 25. Part 3: Collection-Agnostic Linking On TV news: Egypt, Tahrir, Cairo On Twitter: #jan25, #tahrir, #egypt
  • 27. Raiders of the Lost Links Challenges and opportunities: • Legal frameworks for sharing and preserving tweets and linked resources • Collaborations and partnerships to ensure momentum, sustainability • Expansion to other forms of (social) media
  • 28. Lazy Digital Archivists: Your Time is Up Todd Grappone grappone@library.ucla.edu Sharon Farb farb@library.ucla.edu Martin Klein martinklein@library.ucla.edu Peter Broadwell broadwell@library.ucla.edu

Editor's Notes

  1. Todd: Intro, Motivation In recent years, the Library has become the steward of digital ephemera materials In an ever-growing variety of formats, especially when social media is included Many materials at UCLA are from the “Arab Spring” and related movements (2009-?) This represents both a challenge and an opportunity
  2. Todd: Scholars, students, and the public request that the library host, preserve, and make available these materials AND ALSO Provide a suite of service for live capture, analysis, tagging, summarization, and linking of materials So far, we’ve focused on Twitter as the main form of social media of interest to researchers. With the understanding that Twitter collections are most useful when analyzed in bulk and linked to other materials
  3. Todd: Providing a Rashomon-like, multiperspective history service A vital opportunity – and responsibility – of collecting digital news ephemera, especially about recent events, is to collect and present multiple perspectives on the events. “Official” state and corporate TV media in various countries Newspapers are interesting too, but diminishing in influence Independent media, alternative news sources, online sources (incl. blogs) increasingly influential Social media, from different sources, are also vital and may provide sharply contrasting viewpoints if you can filter the signal from the noise Personal media (incl. those linked from social media) are another important piece of the puzzle
  4. Martin: State of the art twitter "backups" at LoC, collects *everything*, not suitable for us Dec 2012: Approx 170 billion tweets, >130TB compressed Grows by half a billion tweets per day Single search against 2006-2010 data may take up to 24hrs Several hundred researcher requests for access, non granted State of uncertainty MIT has access to full stream plus archive Access uncertain, collaboration in infant stages
  5. Martin: State of the art Twitter announcement of indexed archive of all tweets available Game changer but does not solve the problem
  6. Martin: State of the art - realization of multiple ad-hoc initiatives to capture tweets, live, after the event, both - timing issue, may be too late to capture stuff *check twitter api, how far back can we go?* - all building silos, no connection, no collaboration
  7. Martin
  8. Martin
  9. Martin: Decision at UCLA - abstract to higher level and institutionalize as service - 3 pillars: 1) real-time capture including preservation of embedded resources at capture time (!!!) 2) (real-time) rapid analysis 3) collection-agnostic linking “Get your ass to Mars!”
  10. Martin: Implementation level #1: - SFM - filtering by hashtag, user handle, keyword search, location, time, etc - extraction of URIs, pro-active archiving of resources, remote for now
  11. Martin: Implementation level #1: Concrete example of Egypt dataset
  12. Martin: Implementation level #1: Concrete example of Egypt dataset
  13. Martin: Implementation level #1: Concrete example of Egypt dataset
  14. Martin: Implementation level #1: Concrete example of Egypt dataset
  15. Martin: Implementation level #1: Concrete example of Egypt dataset
  16. Martin: Implementation level #1: Concrete example of Egypt dataset
  17. Martin: Implementation level #1: Concrete example of Egypt dataset
  18. Martin: Implementation level #1: If someone needs another (non-UCLA) example
  19. Martin: pro-actively archived URIs from CNI tweets
  20. Pete: Implementation level #2: more conceptual at this stage Emphasizes usability (your average faculty member), flexibility of the service Focus first on "low-hanging fruit" such as word cloud, histograms, geospatial visualization Example: Twitter data scientist Simon Roger’s mapping of reactions to the Ferguson grand jury’s decision on Twitter
  21. Pete: Implementation level #2: Capture and adapt to evolution of corpus context, occurrence of new hashtags, keywords, etc. Again, goal is not to reinvent the wheel; rather, encourage collaboration and use of common frameworks Example: use D3.js, text mining tools available Live demo; also pre-recorded: using a lightly modified version of GWU’s Social Feed Manager Feeds live-capture of tweets via Twitter’s API into to Node.js and D3 real-time visualizations Term cloud on #cnif14 Term cloud from reactions to Senate’s “no” vote re: KeystoneXL pipeline Analysis based on counts of terms, user handles, hashtags, sentiment analysis
  22. Pete: Implementation level #3: - More conceptual at this stage - Background: involves desires we’ve had about both the DEP collection and also NewsScape - linking as the logical next (desired) step, to enhance the collections and highlight the interconnectedness of multi-perspective news accounts - linking to: - (potentially missing) embedded resources - related content in other collections, including news media - Example here: mutually distrustful symbiosis of 24-hour “breaking” TV news outlets and Twitter (Boston Marathon bombing)
  23. Pete: Implementation level #3: Better linking technologies and practices would facilitate cross-collection news analyses like this one: Comparing the volume of tweets from Cairo about the Tahrir Square protests to US TV news coverage about them (from NewsScape) We have similar comparisons for the early days of the Libya civil war and the March 3, 2011 earthquake, tsunami, nuclear crisis in Japan Enables researchers to ask new and more sophisticated questions, get a better sampling of the variety of the recorded reactions to these events Events to point out: 28 Jan “day of rage”, Internet blackout in Egypt until 3 February, Mubarak’s defiant statement, then resignation; weekends in TV news
  24. Pete: Implementation level #3: Another example from the Egyptian revolution, involving potentially missing embedded resources A tweet linking to a TV news resource that is no longer available and wasn’t formally archived, BUT Using enhanced search and linking tools, we can find news coverage of this event and actually many more perspectives on it: a half-dozen different news networks, other Twitter users, other social media? Newspapers?
  25. Sharon: Challenges - legal issues for sharing collected data, preserving tweets and embedded resources - building and maintaining momentum for such efforts, seen in the past that ad-hoc doesn't scale, yet interest is growing, not agreed on model for approaching this - collaborations and partners: GWU, Stanford, UNT, interested web archives - expand to other forms of (social) media