Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resources Consortium (NYARC)
1. Pebbles around a hole, Kinagashima-Cho, Japan (1987). Photo by Andy Goldsworthy
Making the Black Hole Gray:
Implementing NYARC Web Archiving Program of Specialist Art Resources
Deborah Kempe
April 11, 2014
2. “Going forward, one of the biggest challenges
scholars and curators of contemporary art and
architecture face currently, and will increasingly
face, is how to store, retrieve, and investigate
born-digital materials.”
James Cuno, President and CEO of the J. Paul Getty Trust, How Art History
is Failing at the Internet (November 19, 2012)
5. What we mean when we say
Digital Black Hole
Content being produced Content being archived
6.
7. NOT
Why Archive the Web?
BUT
How to Archive the Web?
Who Should Archive the Web?
Who Pays for Archiving the Web?
How do People Navigate Web Archives?
8. ARCHIVING BORN DIGITAL CULTURAL HERITAGE OBJECTS-
NYARC’s EVOLUTION
2010- 2013: Archive-It partnership for pilot studies and intern
projects
9. Capturing born-digital content from auction
house websites
2010 Pilot Project with Mets’ Watson Library
Sean Leahy, Pratt MLIS Student
Intern, Principal Investigator
10.
11.
12.
13. ARCHIVING BORN DIGITAL CULTURAL HERITAGE OBJECTS-
NYARC’s EVOLUTION
2010-2013: Archive-It partnership for pilot studies and intern
projects
Grant support from The Andrew W. Mellon Foundation:
2012-2013 planning grant ($50,000)
‘Reframing Collections for a Digital Age’
14. THE TIPPING POINT
Reframing Collections
Findings 2012/2013
Digital Shift 3> years
Born-Digital “ephemeral”
literature rapidly proliferating
Art historians and museum
staff do not have a clear
understanding of web
obsolescence
But their focus drove our
collecting targets…
15. 2013 PLANNING GRANT RECOMMENDATIONS
• Use Archive-It as the web archiving tool
• Plan incremental growth of collection
• Develop an open nominations tool
• Establish a permissions framework
• Join the National Digital Stewardship Alliance (NDSA)
• Look for ways to further automate metadata creation
• Enlist students into the program, especially for quality assurance
• Collaborate, Collaborate, Collaborate
16. ARCHIVING BORN DIGITAL CULTURAL HERITAGE OBJECTS-
NYARC’s EVOLUTION
2010- 2013: Archive-It partnership for pilot studies and intern
projects
Grant support from The Andrew W. Mellon Foundation:
2012-2013 planning grant ($50.000):
‘Reframing Collections for a Digital Age’
2013-2015 grant-funded program ($340,000):
‘Making the Black Hole Gray: Implementing the Web Archiving
of Specialist Art Resources’
17. PROJECT OBJECTIVES
• During the 2-year grant period NYARC seeks to implement a
program for capturing, making accessible, and preserving
websites for art research
• Harvest and catalog approximately 2 TB of WARC (Web
ARChive file format) files from websites
• Workflow documentation
• Develop and share best practices with the community
18. STAFFING & PARTNER COLLABORATION
• 1 FT Program Coordinator for 2 years
• 3 PT web archiving paid interns per semester for 2 years
(each based at a NYARC library)
• 2 IMLS-funded M-LEAD paid interns based at the Frick
Art Reference Library
• Columbia University Library web archiving staff
• Existing NYARC staff support
• Hardware/software service providers
25. ARCHIVE-IT
• Annual Subscription Service
• Crawls, harvests and hosts web
content, using Open Source tools
and standards developed and
maintained by the Internet Archive:
• Heritrix web crawler /+ Umbra
• Nutchwax search engine
• Wayback Machine browser
• WARC files, ISO Standard
26. More than 275 partners in 16 countries use Archive-It
27.
28.
29.
30. What is a seed?
A seed is any URL that you want to capture:
an entire website:
http://www.intenttodeceive.org/
a specific part of a website:
http://www.intenttodeceive.org/videos/
a specific URL:
http://www.intenttodeceive.org/forger-profiles/han-van-meegeren/
We will use each level
31. Oy, Metadata….Dublin Core, Marc21, RDA, BIBFRAME, Collection vs. Item
But can we have it all??
Rebecca Goldman, Derangement and Description, July 13, 2009, http://derangementanddescription.wordpress.com/page/14/
38. Links to resources cited, and other useful information on born digital content
http://www.dailydot.com/opinion/art-history-failing-internet/
http://www.exlibrisgroup.com/category/PrimoOverview
http://www.deepwebtech.com/products/explorit-everywhere-for-libraries/
http://www.serialssolutions.com/en/services/aquabrowser/
NDSA Web Archiving Survey Report June 2012 (2013 report expected soon!)
http://www.digitalpreservation.gov/ndsa/working_groups/documents/ndsa_web_archiving_survey_report_2012.pdf
http://blog.emilyreynolds.com/2014/04/09/ndsr14-symposium-in-seven-tweets/
Columbia University Web Archiving Summit 2012 https://webarch.cul.columbia.edu./
Archiving the Web for Scholars, by Steve Kolowich
http://www.insidehighered.com/news/2011/05/06/libraries_try_to_preserve_and_archive_websites_for_academic_study
Overview of Web Archiving, by Jinfang Niu http://dlib.org/dlib/march12/niu/03niu1.html
Rebecca Goldman, Derangement and Description, July 13, 2009, http://derangementanddescription.wordpress.com/page/14/
Web Archives for Researchers: Representations, Expectations and Potential Uses, by Peter Stirling, Philippe Chevallier and Gildas Illien. D-
Lib Magazine March/April 2012, Volume 18, Number 3-4, http://www.dlib.org/dlib/march12/stirling/03stirling.print.html
A Memory of Webs Past, Ariel Bleicher, 28 February, 2011
http://spectrum.ieee.org/telecom/internet/a-memory-of-webs-past/0
Digital Scholarship’s Digital Curation Resource Guide http://digital-scholarship.org/dcrg/dcrg.htm
http://nyarc.org/content/reframing-collections-digital-age, blog posting by Stephen Bury, June 18, 2012
Ricky Erway, OCLC Program Officer. Swatting the Long Tail of Digital Media: A Call for Collaboration—Sept. 2012
http://www.oclc.org/content/dam/research/publications/library/2012/2012-08.pdf
39. Further Resources
Library of Congress, The Signal: Digital Preservation blog:
http://blogs.loc.gov/digitalpreservation/
http://www.loc.gov/webarchiving/ Library of Congress Web Archiving
http://netpreserve.org/ website of the International Internet Preservation Consortium (IIPC)
SAA Web Archiving Roundtable http://webarchivingrt.wordpress.com/
Archive-It Knowledge Center https://webarchive.jira.com/wiki/display/ARIH/Welcome
Guidelines for Preservable Websites / Columbia University Libraries
https://library.columbia.edu/bts/web_resources_collection/guidelines_for_preservable_websites
.html
kempe@frick.org