This presentation was delivered in session 306 at the annual meeting of the Society of American Archivists (#saa15). These slides provide information about and lessons learned from the web archiving incentive awards program. Links provided are to facilitate further learning about the tools mentioned but are not a definitive set of resources about these tools.
Collaboration and Cash: Web Archiving Incentive Awards
1. Collaboration and Cash:
Web Archiving Incentive Awards
Anna Perricci
Columbia University Libraries
Society of American Archivists, Session 306
August 21, 2015
3. Today’s session
Taking an expansive view of outreach for web archives,
the speakers discuss methods for encouraging use and
engagement with collections of web content via various
approaches, including collaborative collecting,
cooperative collection development, promoting new
research uses, fostering research and tool development,
advocacy, and working directly with content creators.
Attendees have the opportunity to discuss novel
approaches to promoting the utility and value of web
archives.
4. For more on collaborative collection development
• Overview of grant funded collaborations for RESAW:
http://www.slideshare.net/annaperricci/building-web-archiving-collaborations-
to-save-more-of-the-web & http://dx.doi.org/10.7916/D87943X5
• Focus on Ivy Plus / Borrow Direct for #CUWARC:
http://www.slideshare.net/annaperricci/cuwar-cpres-perrricci2015corrected
• Progress on the Contemporary Composers Web Archive for IAML:
http://www.slideshare.net/annaperricci/ccw-apresentation-iaml2015final
• Process for building CAUSEWAY for ARLIS/NA:
http://www.slideshare.net/annaperricci/establishing-and-growing-a-
multiinstitutional-web-archiving-collaboration-for-the-collaborative-
architecture-urbanism-and-sustainability-web-archive-causeway
6. • We’re a little over four months out from the finish line for this
grant (ends December 31, 2015)
• Work on the incentives award program is wrapping up and
ready to be discussed
• Distributed efforts do not result in a lower workload or cost
savings so far but the outcomes from the collaborative
projects are enriched by the shared expertise and insights
8. Source of funds
A 2012 summit on web archiving held at
Columbia “showed broad agreement on
the need for action in several areas as
web archiving continues to grow. We
need to find ways to share expertise and
infrastructure, to better understand how
researchers will use web archives, and
work with website owners to make their
content easier to collect.”
-Bob Wolven, Associate University
Librarian for Bibliographic Services and
Collection Development
More overview on the collaborative web archiving grant :
http://library.columbia.edu/news/libraries/2013/2013-2-
5_CUL_Mellon_Web_Archiving_Grant.html
9. Visualizing Digital Collections of Web Archives
Primary Investigators: Michele Weigle and Michael Nelson
Institution: Old Dominion University
Project purpose:
Develop tool for showing how a single web page changes over time
For more information see:
https://github.com/machawk1/ArchiveThumbnails,
http://thumbnails.cs.odu.edu:15421,
https://github.com/machawk1/ArchiveThumbnails/blob/master/CalendarRe
sults.jsp &
https://www.youtube.com/watch?v=yeuk_vIOXcw&list=PLf1Dab4lwQhBpFR
B1dpUnKLglmM2iScjl&index=6
10. Tools for Managing Seed URIs
Primary Investigators: Michael Nelson and Michele Weigle
Institution: Old Dominion University
Project purpose:
Develop tool to enable curators to evaluate and detect when their
web archives are off topic or discover new seed sites to include in
collections
For more information see: https://github.com/yasmina85/offtopic-
Detection &
https://www.youtube.com/watch?v=yeuk_vIOXcw&list=PLf1Dab4lwQhBp
FRB1dpUnKLglmM2iScjl&index=6
11. Archiving Transactions Towards
Uninterruptible Web Service
Primary Investigators: Zhiwu Xie and Ed Fox
Institution: Virginia Tech University
Project purpose:
Create or leverage existing tools so when a web resource is
unavailable due to some interruption of a key service an archived
copy will be provided to an end user
• Web archives can serve as a value-added collection to motivate
web archiving as a tool for day-to-day IT operation
For more information see:
https://www.youtube.com/watch?v=6h8MohBSEtI&index=5&list=PLf1Dab4l
wQhBpFRB1dpUnKLglmM2iScjl & https://www.cs.vt.edu/node/7650
12. Perma.cc: Mitigating the Pervasive Problem of Link Rot
in Scholarly Works and Preserving Online Content
Primary Investigator: Jonathan Zittrain
Institution: Harvard Library Innovation Lab
Project Purpose:
Create APIs to extend the use technology supporting Perma.cc, a
tool for authors and editors to make a copy of a cited resource
for preservation and future access (focus on legal resources)
For more information see: https://perma.cc/ &
https://www.youtube.com/watch?v=t_qZ4hNtmyw&index=2&list=PLf1
Dab4lwQhBpFRB1dpUnKLglmM2iScjl
13. Free Law Project
Primary Investigators: Brian Carver and Michael Lissner
Organization: Free Law Project
Purpose:
Expand capacity to obtain opinions on appellate court websites &
involve wider community in scraping work (using Juriscraper);
capture recordings of oral arguments
For more information see: https://github.com/freelawproject/juriscraper,
https://www.courtlistener.com &
https://www.youtube.com/watch?v=t_qZ4hNtmyw&index=2&list=PLf1Dab
4lwQhBpFRB1dpUnKLglmM2iScjl
14. Warcbase: A Web Archives Browser Built on
Modern “Big Data" Infrastructure
Primary Investigator: Jimmy Lin
Institution: University of Maryland
Project purpose:
Warcbase is an open-source platform for storing, managing, and
analyzing web archives using current “big data" infrastructure and
tools (e.g. HBase for storage, Hadoop for data analytics);
-further applications on ‘wimpy hardware’ (Raspberry Pi) also demonstrated for
personal digital archiving http://www.www2015.it/documents/proceedings/companion/p1351.pdf
For more information see: https://github.com/lintool/warcbase &
https://www.youtube.com/watch?v=6h8MohBSEtI&index=5&list=PLf1Dab
4lwQhBpFRB1dpUnKLglmM2iScjl
16. Oversight panel
• An oversight panel reviewed and chose proposals to fund;
project outcomes will be evaluated by oversight panelists
Oversight panel for project selection:
• Kris Carpenter (while at the Internet Archive)
• Mark Phillips (University of North Texas)
• Rob Sanderson (while at Los Alamos National Laboratory)
• Perry Willett (California Digital Library)
Oversight panel for project evaluation:
• Mark Phillips (University of North Texas)
• Martin Klein (UCLA)
• Jefferson Bailey (Internet Archive)
17. Bringing order to what could have been a
terrible mess of emails and attachments
18.
19.
20. We organized a conference
Web Archiving Collaboration: New Tools and Models
Image source: https://www.flickr.com/photos/98463672@N00/19358562581/in/album-
72157655295804376/
21. Information sharing
• Slides and video links:
https://library.columbia.edu/bts
/web_resources_collection/Conf
erences/program.html
• Video playlist:
https://www.youtube.com/playl
ist?list=PLf1Dab4lwQhBpFRB1dp
UnKLglmM2iScjl
• Pictures:
https://flic.kr/s/aHskfjd54s
22. Fair warning
• Can’t say there are plans to
repeat this program
• Many steps needed to work
out requirements for the
sponsored projects office,
invoicing and intellectual
property agreements, etc.
– Many, many, many steps…
Image source: http://www.clipartbest.com/clipart-KcjgXMGKi
23. Hopes for the future
• Ideal: seeing more
development of
generalizable and extensible
tools
• Interoperability with
Archive-It is helpful
whenever possible though
other services / approaches
are being explored as well
24. Parts of a whole / parting thoughts
• Identifying a need and trying to
meet it can lead to novel
approaches and associated
challenges
• Hopefully what we learned can
be used for future reference as
development of digital tools to
improve processes for
preserving and making
accessible digital archives of any
kind are pursued