Supporting Preservation of Research Data in the Chemical Sciences
1. Supporting Preservation of
Research Data in the Chemical
Sciences.
Dr. Simon Coles
School of Chemistry, University of Southampton
2nd June 2009
2. Representation Information for
Crystallography Data
• Representation Information (RI), from the OAIS Model, is any information
required to render, process, interpret, use and understand data.
• Registry/repository for RI (RRoRI) by the DCC and the CASPAR Project
• Crystallography domain and the workflow of the NCS are examined to identify
significant RI
• RI networks relating to the CIF file format are formulated and ingested into
the RRoRI
• Use case scenario describes how the RI stored in RRoRI may be used in order
to gain access to the information content of a CIF instance by someone
unfamiliar with that file format.
3. Preservation Planning for
Crystallography Data
• Original plan was to apply a DRAMBORA assessement to
each of the repositories in the
federation as a means of raising awareness
of curation and preseravtion issues.
• Now covers the notion of trust and trustworthiness
with a brief look at several preservation planning
tools including: the DCC Curation Lifecycle Model; the OAIS
Reference Model; audit and certification instruments
(TRAC, NESTOR, DRAMBORA, Data Seal of Approval); PLATO
and PLATTER (from the PLANETS Project); and cost models
(PrestoSpace, LIFE2 projects).
• Raises curation and preservation issues that are likely to be
relevant in the context of a crystallography community and
the eCrystals federation.
4. Preservation Metadata for
Crystallography Data
• The original aim was to augment the eBank-
UK application profile with preservation
metadata specifically for crystallography data
• Superceded by the development of the
crystallography Data Commons initiative
• Proposed the following…
5. Resources
• Data Set/Collection,
• Raw Data,
• Derived Data,
• Result Data,
• Transient Data,
• Workflow?
6. Publication/Dissemination
Persistent Identifier
Preservation Policy/strategy
Rights management: binding intellectual property rights that may limit the ability to
preserve and disseminate the digital object over time e.g. use and reuse
Technical environment: describing the technical requirements needed to render and use
the digital object e.g. File format, software, instrumentation
Provenance: the custodial history of the object
Context: contextual information indicating how the object was created and under what
circumstances
Authenticity: validating that the digital object is in fact what purports to be, and has not
been altered in an undocumented way e.g. checksum
7. Management
• Embargo e.g. policy
• Representation Information: any information
required to render, process, use, reuse, interpret
and understand the object e.g. Specifications;
File formats; Software; Hardware; Semantics
• Preservation activity: actions taken to preserve
the digital object, and any consequences of these
actions that impact its look, feel, or functionality