2. Main Themes
York
Challenges in developing a large-scale digitzation effort -
looking at HathiTrust
HathiTrust Website
Current governance and cost of digital preservation efforts
Walter & Skinner
Creates incentive by looking at costs of NOT preserving -
MetaArchive Cooperative as community-operated model
Blue Ribbon Task Force Report - Chapters 4 & 5
Challenges behind a digital preservation project, talks about
incentives, and gives simple recommendations and pragmatic
small steps you can take to get a project running
3. Important Terms
Barriers to Entry - tangible costs and challenges to start a project
Benign Neglect - choosing not to focus on preservation today
Contributing Partner - contribute content and pay infrastructure
costs for deposited content
Committee on Institutional Cooperation (CIC) - includes the Big
Ten schools and University of Chicago
Economy of Scale - cost advantage to lower both the average
and marginal costs of preservation in a large repository
MetaArchive Cooperative - community-owned and operated
distributed digital preservation network
Misaligned Incentives –each participant in a transaction has their
own incentives to act. Each party’s incentives are not the
same, and sometimes they conflict.
4. Important Terms cont.
Negative Benefits - looking at the cost of NOT preserving as
incentive to do digital preservation
Non-Exclusive License - rights of the authors to deposit
publications into third party repositories
Sustaining Partner - participate in curation and management,
but do not necessarily contribute content
Trusted Digital Repository - certified by TRAC or DRAMBORA
whose criteria are based off metadata and formatting
standards and best practices
Uncertain Future Value - intangible long-term benefits and costs,
unable to gauge the benefits for future
Zero-Sum Activity - time and money invested into preservation is
taken directly from other activities
5. Blue Ribbon Task Force
There is a disruption of roles and
responsibilities among players, resulting
from the non-rivalrous nature of digital
information
Chapter four outlined four types of
digital information and the challenges
and proposed recommendations for
each
6. Scholarly Discourse
Misaligned incentives
• Publishers
o have high incentives to preserve these
materials
o have shown little resistance to participating in
dark archive models
• Authors/ Creators
o should stipulate perpetual, non exclusive
license to their works
o collective bargaining to secure these rights
o individual use of these licenses could lower
barriers to preservation of emerging literature
• Libraries
o mediation
7. Question:
Do you think collective bargaining would be effective
to help lower barriers to entry for emerging authors?
Do you think this tactic is feasible?
8. Free rider problem
large startup costs create barriers to entry
no one wants the be the first mover
Uncertain future value
secondary and tertiary uses of the information
Funding
the current models exclude a significant portion of the
scholarly community:
smaller publishers
under-resourced fields
independent scholars and the commercial sector
9. Action Agenda for
Scholarly Discourse
1. Libraries, scholars, and professional societies should develop
selection criteria for emerging genres in scholarly discourse, and
prototype preservation and access strategies to support them.
2. Publishers reserving the right to preserve should partner with third-
party archives or libraries to ensure long-term preservation.
3. Scholars should consider granting nonexclusive rights to publish
and preserve, to enable decentralized and distributed
preservation of emerging scholarly discourse.
4. Libraries should create a mechanism to organize and clarify their
governance issues and responsibilities to preserve monographs
and emerging scholarly discourse along lines similar to those for e-
journals.
5. All open-access strategies that assume the persistence of
information over time must consider provisions for the funding of
preservation.
10. Research Data
Research data vary enormously
Often a need to preserve ancillary materials, such
as lab notebooks
Secondary uses of public research data suggest a
new users willing to support long-term access to the
data
Preservation societies and other proxy organizations
can play crucial roles in selection for preservation
11. In grant-funded research, preservation is
framed as a zero-sum game
Imposition of mandates will strengthen
incentives
• clear allocation of funds ( via a portion of the grant)
• clear selection criteria
Funders should be seeding capacity
Subscription models help mitiagte the free-
rider problem
There should be agreements in place
between the data community and third-
party archives
12. Action Agenda for Research Data
1. Each domain, through professional societies or other
consensus making bodies, should set priorities for data
selection, level of curation, and length of retention.
2. Funders should impose preservation mandates, when
appropriate. When mandates are imposed, funders should
also specify selection criteria, funds to be used, and
responsible organizations to provide archiving.
3. Funding agencies should explicitly recognize “data under
stewardship” as a core indicator of scientific effort and
include this information in standard reporting mechanisms.
4. Preservation services should reduce curation and archiving
costs by leveraging economies of scale when possible.
5. Agreements with third-party archives should stipulate
processes, outcomes, retention periods, and handoff triggers.
13. Commercially Owned
Cultural Content
Who owns it? Misalignment between owners
and controllers of digital content arise in
almost every case
Creates widespread disruption of business
models that provide the primary incentives for
commercial owners to preserve
14. Must strengthen the rights of preserving institutions by revising copyright
law
o mandate deposit of copyrighted electronic
content into authorize public institutions to
secure their lone-term preservation
o provide incentives directly to private owners of
cultural assets to preserve on the public's behalf
o commercial sponsorship of preservation
activities and public-private partnerships
o stewardship organizations should begin
selecting privately held materials of signigicant
cultural value
16. Action Agenda for Commercially
Owned Cultural Content
1. Leading cultural organizations should convene expert
communities to address the selection and preservation needs
of commercially owned cultural content and digital orphans.
2. Regulatory authorities should bring current requirements for
mandatory copyright deposit into harmony with the
demands of digital preservation and access.
3. Regulatory authorities should provide financial and other
incentives to preserve privately held cultural content in the
public interest.
4. Leading stewardship organizations should model and test
mechanisms to ensure flexible long-term public-private
partnerships that foster cooperative preservation of privately
held materials in the public interest.
17. Collectively Produced Web Content
No clarity about what specific content should be collected
Institutions that are already crawling the web should provide
leadership to others
Collective content may be a composite of linked product with
compound rights within them
o Bloggers may use some sort of license to clarify whether
they want their material archived
o Provide incentives for the hosting sites to preserve
o develop partnerships between hosting sites and
stewardship institutions
o grant stewardship institutions the legal authority to crawl
the web for preservation purposes
18. Create public policies and or
partnerships to enable grassroots efforts
at preservation
Collective action will be needed to
secure these assets
• public funding
• public mandates
19. Action Agenda for Collectively
Produced Web Content
1. Leading stewardship organizations should convene stakeholders
and experts to address the selection and preservation needs of
collectively produced Web content.
2. Creators, contributors, and host sites could lower barriers to third
party archiving by using a default license to grant nonexclusive
rights for archiving.
3. Regulatory authorities should create incentives, such as preservation
subsidies, for host sites to preserve their own content or seek third-
party archives as preservation partners.
4. Regulatory authorities should take expeditious action to reform
legislation to grant authority to stewardship institutions to preserve
at-risk Web content.
5. Leading stewardship organizations should develop partnerships with
one or more major content providers to explore the technical, legal,
and financial dimensions of long-term preservation.
20. Blue Ribbon Chapter 5
Which digital content to preserve, for how long, and for what use?
Who should be in charge?
How to secure funding and resources?
How to determine the return of investment?
Necessary conditions for sustainable digital
preservation:
1. recognition of the benefits of preservation
2. choosing the materials that have long-term value
3. incentives to act in the public interest
4. appropriate governance to oversee the activities
5. ongoing effort to preserve
6. timely actions to ensure access
21. Principle of actions
1. Create contingency plan for actions to preserve in advance
prevent risk of losing digital assets, and entrust the materials to
a responsible party
set up mechanisms (eg. MOUs) to prompt regular review of
preservation priorities
2. Argue for a need to invest in preservation
emphasize the gains on possible usage of digital assets,
especially short-term
also argue about the cost of not preserving the assets eg. losing
clinical trial data
argue for potential benefits that will trickle to multiple
stakeholders
3. Strengthen weak incentives, aligned the incentives when facing a
diverse stakeholder community, generate incentives when none
exist
22. Principle of actions
4. Prioritize the digital collections based on projected future use
careful selection of which digital assets to save, especially
materials of greatest use to present & future stakeholders
the decision to preserve now need not be a permanent or
open-ended commitment of resources over time
5. Stakeholders' roles & responsibilities should be transparent &
accountable
organizations should have clear policies the specify their roles,
responsibilities, and procedures
collective interest must be aggregated, and the effort & the cost
must be appropriately apportioned
6. Funding models must fit the community norms
digital assets need not always be a public good
funding models should be flexible to adjust to disruptions over
time; create an economy of scale whenever possible (especially
scientific data & cultural assets)
23. Near-term priorities
Organizational action
form public-private partnerships
ensure organizations have required expertise
achieve economies of scale & scope
address the free-rider problem
Technical action
build capacity to support stewardship
reduce preservation cost
operationalize an option strategy for all types of digital material
Public Policy action
ease copyright laws to facilitate digital preservation
generate incentives for private entities to preserve on behalf of the public
sponsor public-private partnership
empower stewardship organizations to avert loss of digital orphans
Public Outreach action
provide training for curatorial skills
educate public the urgency for preservation of digital assets
24. York and HathiTrust Website
Looks at the development of the HathiTrust
Challenges of a large-scale digital
preservation initiative
Establishment and Purpose
• Google
• Members
• Preservation
Goals
25. Question:
Looking at these areas, what are some of the major
challenges you think might come about in developing
digital preservation initiative?
The Challenges:
Governance
Finance
Repository
Services
26. Challenges in Governance
Types of BAD collaboration
o "Goal Drift"
o No buy-in from administrative bodies
Tension
o Perception that collaboration will limit independence of
participants
o Fear of slow decision-making process
Solution: HathiTrust Governance
o Executive Committee
o Strategic Advisory Board
o Constitutional Convention
o Voluntary Membership
27. Challenges in Finance
Funding Downfalls
o Voluntary Membership - Potential dissolution of
the partnership
o Minimal Funding Sources
o No long-term plan beyond 5 years
Solution:
o Formal evaluation at the 3-year mark
o Will develop a succession and multi-year
funding plan
o Different Levels of partnership
28. Challenges with the Repository
Trusted Environment
o Trusted Digital Repository Certification did not
exist
o Time and Cost of certification
Collaborative Development
o Discovering redundancy
o What version is the right version?
Solution:
o Certification, Standards, and Best Practice
o Implication of having a unified digital repository
29. Challenges with Services
Basic Access
o Print-disabled users
o Compliance with accessibility standards
Search
o No interface for searching
o No comparable models for searching across
institutions of this magnitude
Extended Capabilities
o Integration with with software/primary source
collections
o Print on-demand
o Inter-institutional authentication and security
30. Critical Observations
Governance
Duties of new governing bodies not explained in
much detail
Finance
Failed to look at funding sources outside the
partnership
Repository
Cost of long-term preservation sustainability
31. Walter – Cost of Not Preserving
Cost of digital preservation = 'benign neglect'; cultural either choose to preserve
today, or defer the preservation to tomorrow
benign neglect misses the fact that digital assets are vulnerable & storage
media are unstable
1. Cultural cost:
intangible cost of narrow understanding of our cultures & histories by current &
future generations
2. Political cost:
loss of resources & documentations essential for understanding local, state,
national, & international developments
3. Scientific cost:
loss of data for all areas of research needed for academic advancement
Libraries that begin early towards digitization and content creation efforts will
benefit from better acquisitions, more users, higher quality of users & financial
resources increase their prestige & bottom line
32. The MetaArchive Cooperative Model
Founded in 2003 as a community-owned & operated digital
preservation network
Cooperative model: all members contribute monetarily, staff,
technology & space reduces cost for all cooperating parties &
increases sense of joint ownership
expanding membership fees and cooperative-oriented staffing
replace initial public funding from the Library of Congress
Adopt LOCKSS software: all members host servers within their institution,
but are connected in a peer-to-peer network avoid a central
cache
34. MetaArchive Cost
1. Establish the 1st private LOCKSS network (with NDIIPP funding)
2. Transform into a sustainable 501c3 charitable organization (with NHPRC & NDIIPP
funding)
3. Provide ongoing preservation training & services to the cultural community (with
membership & consulting fees)
Cost components are mainly expert personnel
1. Collaborative relationship-building
2. planning & policy making
3. staff training
4. selecting & implementing network systems
5. developing & maintaining software
6. selecting digital assets for preservations
7. documenting the digital assets
8. preparing the assets for the preservation network
9. assessing and monitoring the assets in the network
10. infrastructure
35. Current cost
Basic costs:
Equipment = $4600 for a server
Staffing = 2% of a systems administrator’s time, software engineer
Storage = $1/GB/year for network storage
Membership fees
Sustaining members = $5500/year, typically lead institutions in the field
Preservation members = $3000/year, mainly participants &
beneficiaries
Sample costs:
For an institution that want to preserve 2 TB of :
Sustaining Member: [$5,500 (membership) +$2,000 (space) x 3 years] +
$4,600 (server) = $27,100/3 years, or $9,033/year
Preservation Member: [$3,000 (membership) + $2,000 (space) x3 years]
+ $4,600 (server) = $19,600/3 years, or $6,533/year