SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.
SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.
Developinginstitutional RDMservicesMichael DayDigital Curation Centre (DCC)UKOLN, University of BathDCC Workshop, Cardiff University 14 May 2013
Session outline Managing active data Storage options Long-term retention of data Selection criteria Data repositories Finding and citing data Data registries and metadata Presentation based on: Sarah Jones, Graham Pryor and Angus Whyte, How toDevelop Research Data Management Services – a guide for HEIs (DCC, 2013): http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services Some slides reused from RDMRose training materials: http://rdmrose.group.shef.ac.uk/
Managing active data: key tasks Researchers: Have a duty to ensure that research data is stored securely and backed-up on aregular basis Have choices (e.g. network drives, laptops, external storage devices, online /cloud-based storage) Need to take data security seriously This should be considered as part of the data management planning process Institutions: Need to be constantly review data holdings and RDM practices in order toevaluate whether current storage infrastructures are sufficient May need to make a case for investing in the provision of additional data storagecapability Need procedures for the allocation and management of storage Need to be flexible, taking account of a diverse range of research contexts anddata storage requirements
Research data storage Trend for some HEIs to enhance the capacity ofresearch data storage facilities Extending capacity of existing filestores (e.g. Bath) Exploring secure cloud storage Utilising High Performance Computing facilities Managing storage University of Bristol (data.bris) – registered researchers (datastewards) are allocated 5TB storage to manage, e.g. decidinghow long data should be kept, who has access, etc. http://data.blogs.ilrt.org
Options for managing active data Cloud storage options There may be benefits in terms of costs and expertise There may also be risks (e.g. loss of control, jurisdictionalissues) Janet Brokerage - promoting the use of cloud and off-site datacentre facilities Academic dropbox-like services Dropbox is often used for sharing and synching data betweenmachines, but institutions are keen to retain control Systems developed in-house Typically developed with an disciplinary focus, e.g. BRISSkit(biomedicine)
Selecting data for retention RCUK, Common Principles on Data Policy (2011): “Data with acknowledged long-term value should be preserved and remainaccessible and usable for future research” http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx Institutions will need to establish clear criteria to guide decisions on whatshould be kept It will not be possible to retain everything Carefully considered selection processes are essential to help prioritise that datathat has long-term value Institutional selection processes will need to take account of: Data that institutions are legally obliged to retain (or destroy), e.g. for contractualor regulatory reasons Different disciplinary practices (e.g., some disciplines will have mature datasharing infrastructures and will already deposit data with third party services) Researcher sensitivities about losing control of data (deposit agreements)
Developing guidance on selection Establishing guidelines, processes and goodpractice for data selection and deposit can beone of the more challenging aspects of an RDMservice There is a need for buy-in from researchers There is a need for clarity on what kinds of data arewithin the remit of an institutional RDM service There may be a need to apply different levels ofcuration, e.g. depending on the perceived value of thedata accepted
DCC selection categories DCC How to Select and Appraise Research Data forCuration (Whyte and Wilson, 2010) proposes sevenmain criteria: Relevance to mission Scientific or historic value Uniqueness Potential for redistribution Non-replicability Economic case Full documentation http://www.dcc.ac.uk/resources/how-guides/appraise-select-data
Data repositories Focusing on how data will be preserved andmade available for others Main options: Developing an institutional data repository Building, where possible, on existing systems, e.g. IR, CRIS,etc. Essex Research Data demo: http://researchdata.essex.ac.uk/ Liaising with external research data repositories (or datacentres) Often subject based, some UK data centres supported byfunding bodies Providing researchers with information on external services
RCUK Common Principles RCUK, Common Principles on Data Policy (2011): “To enable research data to be discoverable andeffectively re-used by others, sufficient metadata shouldbe recorded and made openly available to enable otherresearchers to understand the research and re-usepotential of the data. Published results should alwaysinclude information on how to access the supportingdata” Also EPSRC Principle 6 http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
EPSRC Expectation V “Research organisations will ensure that appropriatelystructured metadata describing the research data theyhold is published (normally within 12 months of the databeing generated) and made freely accessible on theinternet; in each case the metadata must be sufficient toallow others to understand what research data exists,why, when and how it was generated, and how toaccess it. Where the research data referred to in themetadata is a digital object it is expected that themetadata will include use of a robust digital objectidentifier (For example as available through theDataCite organisation - http://datacite.org).” http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspxMay-13Learning material produced by RDMRosehttp://www.sheffield.ac.uk/is/research/projects/rdmrose
Some questions to consider What metadata is required to adequately recorddatasets? What is “sufficient metadata” for discovery andre-use? Does any of this metadata already exist? If so, where might it be found? If not, how can the appropriate metadata be generated orcaptured? Will there be a need to share this metadata, e.g. withthird-party discovery services? National data services? If so, what standards exist to support metadata sharing?
Examples: UKOLN Scoping Study Scientific Data Application Profile Scoping Study (UKOLN, 2009) Building on work undertaken on the Scholarly Works Application Profile(SWAP) Analysed the metadata used by UK data centres and repositories,selected domain models (e.g. DDI, CCLRC Metadata Model, CIDOCCRM) Concluded that: Simple Dublin Core (e.g., as mandated by OAI-PMH) would be insufficient There was sufficient convergence between the different schemas to suggestthat a generic metadata profile could be constructed A generic metadata profile would benefit interdisciplinary research andinstitution based services (e.g. IRs) http://www.ukoln.ac.uk/projects/sdapss/
Examples: DataCite metadata (1) DataCite:Organisation aiming to facilitate easier accessto (and citation of) research data, e.g. throughthe use of persistent identifiers (DOIs)DataCite Metadata Schema (currently v. 2.2,2011) defines core metadata propertiesBroadly based on Dublin Core conceptshttp://schema.datacite.org
Examples: University of Oxford The DaMaRO project at the University of Oxford is developinga metadata schema for its DataFinder (Rumsey, 2012). A three-tier metadata approach: Mandatory minimal metadata to enable basic discovery, such asCreator, Title, Publisher, Date, Location, Access terms &conditions Mandatory contextual metadata (mostly administrative andpartly based on EPSRC expectations), such as Funding Agency,Grant Number, Last access request date, Project Information,Data Generation Process, Why the data was generated, Date(range) of data collection, Reasons for embargoes Optional metadata (including discipline-specific metadata) toenable reuse, such as machine settings and the experimentalconditions under which the data were gatheredMay-13Learning material produced by RDMRosehttp://www.sheffield.ac.uk/is/research/projects/rdmrose
Examples: University of Essex RDE Metadata Profile for EPrints Based on DataCite, INSPIRE, DDI 2.1 and DataShare Mixture of generic schema and standards specific tosocial science data http://data-archive.ac.uk/media/375386/rde_eprints_metadataprofile.pdf Seems to be convergence on layered approach
Some practical questions (1) Technical choices for institutions: Developing new institutional services, e.g. theapproach taken by ANDS:http://www.ands.org.au/guides/metadata-stores-solutions.html Defining metadata stores by their coverage, the granularity ofdata that they describe, and the specialisation of theirdescriptions (e.g. collection-level, object level, local,institutional, national and discipline-specific) Building upon existing infrastructures, e.g.: Institutional Repositories CRIS (e.g. Pure, Symplectic, Converis)
Some practical questions (2) Research Information Management interaction? There is interest in what RIM standards like CERIF can offer RDM (e.g.potentially richer metadata structures for linking research outputs withorganisational groupings and funding streams, some level of buy-in fromfunding bodies), but implementation CERIF for Datasets (C4D): http://cerif4datasets.wordpress.com We need to think about how metadata can be shared with: Discipline-based repositories and data centres Emerging national (and international) discovery infrastructures Australian National Data Service Uses RIF-CS schema (based on ISO 2146:2010) as a data interchange format Jisc and DCC are currently exploring the options for collating metadataabout research data at national level
Data Citation Issues include (Ball & Duke, 2011a and b): At what granularity should data be made citeable? How to credit each contributor in a dataset that isassembled from very many contributions? Where in a research paper should a data citation begiven (e.g. a paper describing a dataset versussubsequent papers using it)? What to do with frequently updated data?May-13Learning material produced by RDMRosehttp://www.sheffield.ac.uk/is/research/projects/rdmrose
DataCite DataCite (http://www.datacite.org) is a not-for-profitorganisation that aims to promote and support thesharing of research data They are developing an infrastructure that supportsmethods of data citation, discovery, and access They are currently leveraging the DOI (Digital ObjectIdentifier) infrastructure, which is also used for researcharticles They can provide DOIs for datasets DataCite DOIs have to resolve to a public landing pagewith information about the dataset and a direct link to itMay-13Learning material produced by RDMRosehttp://www.sheffield.ac.uk/is/research/projects/rdmrose
DataCite Basic form: Creator (PublicationYear): Title. Publisher. Identifier Version and ResourceType are optional extra elements For citation purposes, DataCite recommends that DOInames are displayed as linkable, permanent URLs More info in DataCite (2011) University of Poppleton (2011): Precipitationmeasurements 1905-2010 taken at Western Bankweather station. Meteorological service, The Universityof Poppleton. http://dx.doi.org/10.1594/UoP.MS.298May-13Learning material produced by RDMRosehttp://www.sheffield.ac.uk/is/research/projects/rdmrose
References Ball, A., (2009). Scientific Data Application Profile Scoping Study Report. Bath:UKOLN, University of Bath. Retrieved from: http://www.ukoln.ac.uk/projects/sdapss/ Ball, A., & Duke, M. (2011a). Data Citation and Linking. DCC Briefing Papers.Edinburgh: Digital Curation Centre. Retrieved fromhttp://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/data-citation-and-linking Ball, A., & Duke, M. (2011b). How to Cite Datasets and Link to Publications. DCCHow-To Guides. Edinburgh: Digital Curation Centre. Retrieved fromhttp://www.dcc.ac.uk/resources/how-guides/cite-datasets DataCite (2011). DataCite Metadata Schema for the Publication and Citation ofResearch Data. Version 2.2. London: DataCite. Retrieved fromhttp://schema.datacite.org/meta/kernel-2.2/doc/DataCite-MetadataKernel_v2.2.pdf.doi:10.5438/0005 Rumsey, S. (2012). Just enough metadata: Metadata for research datasets ininstitutional data repositories [PowerPoint presentation]. Oxford: The University ofOxford. Retrieved fromhttp://damaro.oucs.ox.ac.uk/docs/Just%20enough%20metadata%20v3-1.pdfMay-13Learning material produced by RDMRosehttp://www.sheffield.ac.uk/is/research/projects/rdmrose