eScience and Digital Preservation, presentation to Association for Information Science and Technology (ASIST) conference November 2004, Rhode Island USA, is the sixth of 12 presentations I have selected to mark 20 years in Digital Preservation.
It is closely related to the previous slideshare for May on the Jisc continuing access and digital preservation strategy but focuses just on the science component.
This is one I wasn’t able to present in person but it was kindly delivered by Gail Hodge.
My brief for the presentation was "thoughts or citations you have for the impact of e-science, particularly the GRID, on information management, particularly archiving, preservation and long-term access."
It is a short presentation of 15 slides covering collection-based science, the Grid, data publishing, and the background and rationale for the Digital Curation Centre (just launched two weeks before in the UK).
It is a snapshot in time and of key issues in 2004 – interesting to contrast with what one would write 10 years on and ponder on progress made.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
20yrs: 2004 ASIST rhode_island
1. Supporting further and higher education
E-science and Digital Preservation
Neil Beagrie
BL/JISC Partnership Manager
ASIST Annual Conference Nov 2004
E-science Panel
2. 2
Overview
• Apologies for absence (and thanks to
Gail for presenting!)
• Trends and implications
– Data growth
– e-Research and collection-based science
– The Grid
– New publishing roles for datasets
– Digital preservation
• Digital Curation Centre
– What the funders are looking for
3. 3
Growth of Scientific Data and Data
Curation
• In next 5 years e-Science will produce
more data than has been collected in the
whole of human history
• Data growth – Protein Data Bank (1972-
2003)
4. 4
Implications
• Core Funding for institutions will not
grow in line with information growth
• Need for more automation and tools
• Need for new shared services– lower
the curation cost for disciplines –
accelerate knowledge transfer
• Significant need for R&D and
investment now to prepare for this
5. 5
Collection-based Science (1)
• National Science Foundation Advisory
Panel on Cyberinfrastructure
– “The importance of data in science and engineering
continues on a path of exponential growth; some
even assert that the major science driver of high
end computing will soon be data…Collecting,
organizing, storing, and providing access to vast
quantities of data and other information (such as
scholarly publications) is becoming as important as
simulation has been and will likely grow faster over
the next decade.”
6. 6
Collection-based Science (2)
• NSF Advisory Panel on Cyberinfrastructure
• “To succeed NSF must… ensure that the exponentially
growing amounts of data are collected, curated,
managed, and stored for broad long-term access by
scientists everywhere.”
• “Data Repositories...Providing access to observational
and other data entails far more than attaching a lot of
disks to a server that is on the Internet.”
• “R&D centers could be established for addressing
common issues…there may be advantages to grouping
applied research, development, and operations within
a common organization and geographic location.”
7. 7
The Grid
• ‘The Grid is a software infrastructure that
enables flexible, secure, coordinated
resource sharing among dynamic
collections of individuals, institutions and
resources’ ( Foster, Kesselman and Tuecke)
• Includes computational systems, data
storage resources, digital libraries and
specialized facilities
8. 8
e-Government and the Grid
‘[The Grid] intends to make access to
computing power, scientific data
repositories and experimental
facilities as easy as the Web makes
access to information.’
Tony Blair, 2002
Implications for dp - Grids could
enable better replication and
preservation, and access
9. 9
Data Publishing
In some subjects databases are wholly or
partly replacing journal publications as a
medium of communication
– These databases are built and maintained with a
great deal of human effort
– Scale of effort and supporting infrastructure varies
– may have discipline-wide scope and dedicated
“curators”
– They may not contain primary data. Sometimes
just value-added annotation/metadata
– They borrow/exchange extensively, and refer to
other databases and journal articles
– May have evolved from supporting/internal facing
role to publishing to external audiences
10. 10
Ordnance Survey
• Publication in paper editions at different
scales since 1791.
• Computerisation first designed to assist in
workflow of paper publication.
• OS National Topographic Database (NTD)
• For large –scale mapping paper editions
now discontinued. NTD is the map
-continuously updated and printed
remotely on demand.
11. 11
Digital Preservation
“ digital documents last forever –or five
years, which ever comes first” (Jeff
Rothenberg 1997)
BBC Domesday System
12. 12
Organisational and technical
challenges
“….I have data files from projects from years
ago which are on disks I no longer have a
drive for on computers I no longer have
access to or are no longer made or the
software/operating system changes would
make it extremely difficult to access any
more…. the nature of research work means a
lot of short-term researchers over the years …
Also as PIs move around and collaborate with
many people in other organisations it is pretty
difficult to go back more than a few years
with confidence that data will be adequately
archived.”
(Interview quote from UK-based Professor cited in JISC
Audit of e-Science Curation report)
13. 13
Digital Curation Centre
• Joint funding JISC and e-science core
programme
• Three year initial funding - $6m
• Awarded to Consortium of Edinburgh,
Glasgow, CCLRC, UKOLN
• Not a data centre – will provide generic
support services and research
• DCC officially launched 5th
November
2004
14. 14
What the DCC funders are looking for
• Research into data curation and
preservation issues
• advisory services in best practice and a
repository for tools, software and
documentation
• DCC is not being funded to set up its
own data repository
• DCC will need to work with key data
centres, repositories and libraries to
engage the relevant communities
15. 15
Further information
• Digital Curation Centre
www.dcc.ac.uk
• The Continuing Access and Digital
Preservation Strategy for the UK
Joint Information Systems
Committee (JISC)
http://www.dlib.org/dlib/july04/beag