Slides from a presentation given at the JIBS User Group / RLUK joint event "Demystifying research data: don't be scared, be prepared" held at the SOAS Brunei Gallery, London, 17 July 2012.
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
Introduction to research data management
1. … because good research needs good data
Introduction to Research Data
Management: activities, roles and
requirements
Michael Day
Digital Curation Centre
UKOLN, University of Bath
m.day@ukoln.ac.uk
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or,
(b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
2. … because good research needs good data
Outline
• Introduction
• The researcher perspective
• Codes of Practice
• Research funding bodies
• The institutional perspective
• Research lifecycles
• Some lifecycle models
• The role of the library
• Activities, roles and requirements
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
3. … because good research needs good data
Why manage research data?
• Enable reuse
• Research integrity
• Research impact
• Linking data and publication
• Making data citable
• Regulatory requirements
• Controlling costs
• Maximising value
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
4. … because good research needs good data
Who are the main actors?
• Researchers - as creators and users
• Other Data creators
• Other Data (re)users
• Funding bodies
• Data Centres
• Computer science research
• Libraries
• Research support/grant offices
• Archivists/records managers
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
5. … because good research needs good data
What is required?
• Technical infrastructure
• Storage (many options)
• Tools
• Discovery
• Research Intelligence (RIM)
• Policy & commitment
• Human infrastructure
• Researcher skills
• Support services
• Training
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
6. … because good research needs good data
Potential national-level actions
• Building dataset discovery
• Collecting data policies
• Liaise with other national & international actors
• Support uptake of cloud-based tools
• Exploit pool of data plans
• Collecting stories on data re-use
• Supporting effective citation, referencing, etc
• Sharing good practice
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
7. … because good research needs good data
The researcher perspective
• Managing and sharing data is simply part of good
research:
• Adhering to disciplinary and/or institutional codes of practice
and policies
• Has been practiced since the advent of modern science, but
not always consistently; data intensive research makes it
even more critical
• Meeting the specific requirements of funding bodies
• Reputational risks if data management is not handled
properly
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
8. … because good research needs good data
Research codes of practice (1)
• UK Research Integrity Office Code of Practice for
Research (2009)
Data management planning is an essential part of research
design
Organisations should have in place procedures, resources
(including physical space) and administrative support to
assist researchers in the accurate and efficient collection of
data and its storage in a secure and accessible form [3.12.5]
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
9. … because good research needs good data
Research codes of practice (2)
• RCUK Code of Conduct on the Governance of Good
Research Conduct (2011)
Primary data and research evidence [should be made]
accessible to others for reasonable periods after the
completion of the research: data should normally be
preserved and accessible for 10 yrs (in some cases 20 yrs or
longer)
Responsibility for proper management and preservation of
data and primary materials is shared between the researcher
and the research organisation [although deposit within
national collections is endorsed]
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
10. … because good research needs good data
Research funding bodies
• UK Research Councils
• Help fund some data archives, e.g.:
• Archaeology Data Service, European Bioinformatics
Institute, the NERC data centres, UK Data Archive
• Support for JISC (and DCC)
• RCUK Common Principles on Data Policy
• Recognises that data are a critical output of the research
process
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
11. … because good research needs good data
RCUK Principles (in a nutshell)
• Publicly funded research data should be made openly available
• Data with acknowledged long-term value should be preserved and
remain accessible and usable for future research
• Sufficient metadata should be recorded to enable other researchers to
find and understand the research to enable re-use; published results
should always include information on how to access the supporting data
• Recognition that there may be legal, ethical and commercial constraints
• Recognition that researchers may need privileged use of data for a
limited period
• All users of research data should acknowledge their sources
• Appropriate to use public funds to support MRD
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
12. … because good research needs good data
EPSRC expectations
• Roadmap approved May 2012; compliance by May
2015
Appropriate metadata (including unique IDs) to be made freely
available on the Internet within 12 months of data generation
Data not generated in digital format should be stored in a manner to
facilitate it being shared
Data should be securely preserved for a minimum of 10 years after
privileged access expires or the last date access was requested by
a third party
Adequate resources from existing funding streams
EPSRC will monitor progress and compliance, and reserves the
right to impose appropriate sanctions
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
13. … because good research needs good data
Implications for researchers
• Increasing number of research councils and funding bodies with data
management and sharing requirements
• Potential loss of research income if these mandates are not met
• Need to determine the costs associated with short and longer-term
management and curation and to request funds as part of grant
• Responsibility for infrastructure shifting more to HEIs and less to
centralised data archives, but institutional infrastructures and services
are still emerging
• Need guidance - some good external support
• But also need more local support; often fragmented (need to draw upon
existing channels within your institution wherever possible)
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
14. … because good research needs good data
Institutional drivers
• Safeguarding research integrity
• Increasing number of FOI requests for data
• Adhering to existing codes of research practice and ethics
• Developing new institution-wide strategies, policies and services
for data storage and management
• Increased institutional focus on research management (e.g., in
response to REF)
• Benchmarking – self-assessing infrastructure and planning for
improvement
• More demands but less resources to work with
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
15. … because good research needs good data
Institutional actors
• Researchers
• Both as creators and users of data
• PIs (e.g., have specific roles WRT grants)
• Computer scientists (informaticians, data scientists)
• Administration
• Research support office (e.g., grants support, research
information management)
• Records managers, archivists, FOI office
• Central services
• Computing services
• Libraries (e.g., institutional repository)
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
16. … because good research needs good data
Research data lifecycles
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
17. … because good research needs good data
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
18. … because good research needs good data
(e)-Research Life Cycle view of Data Curation?
Formulate hypothesis / ideas, test,
(New) knowledge Data processing experiment, observe: data creation,
extraction: data
collection & capture
mining, modelling,
analysis, synthesis Data processing
Data processing
Data management
e-Infrastructure storage & validation:
Adding value: Data
description, deposit,
linking, annotation, Open access
self-archiving,
visualisation, simulation
Collaboration preservation,
certification
Data processing
Data processing
Scholarly communications: data disclosure,
publication, citation, discovery, re-use
This work is licensed under a Creative Commons License Funded by:
Attribution-ShareAlike 2.0 Liz Lyon December 2005
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
19. … because good research needs good data
E-Science Curation Report - 2003
• E-science
discipline
• Appropriate
for current
focus
• Takes
integrated
look at higher
education
data curation
problems
• Granularity
on curation
activities?
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
20. … because good research needs good data
Open Archival Information System
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
21. … because good research needs good data
RDM at
Oxford
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
22. … because good research needs good data
Research360@Bath
• New institutional data
scientist role
• Addresses EPSRC
expectations (published)
• Doctoral Training Centre
hubs
• Faculty-Industry focus
• Faculty cascade model
• Multi-team approach
http://blogs.bath.ac.uk/research360/ Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
23. … because good research needs good data
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
24. … because good research needs good data
Some library roles (in the lifecycle)
• Leadership – coordinate action
• Audit – who has what, where does it go?
• Advice on access – data, wherever it is
• Preservation (long-term access requirements)
• Citability
• Data/publication linking
• Promoting data in teaching
• Identifying skill gaps / CPD requirements
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
25. … because good research needs good data
Re-skilling for research (RLUK, 2012)
• Mary Auckland identified 9 key areas with skill gaps for
subject librarians:
• Ability to advise on preserving research outputs
• Knowledge to advise on data management and
curation, including ingest, discovery, access,
dissemination, preservation, and portability
• Knowledge to support researchers in complying with the
various mandates of funders, including open access
requirements
• Knowledge to advise on potential data manipulation
tools used in the discipline/ subject
• Knowledge to advise on data mining
• Knowledge to advocate, and advise on, the use of
metadata
• Ability to advise on the preservation of project records
e.g. correspondence
• Knowledge of sources of research funding to assist
researchers to identify potential funders
• Skills to develop metadata schema, and advise on
discipline/subject standards and practices, for
individual research projects
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
26. … because good research needs good data
Understanding data requirements
http://www.dcc.ac.uk/
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
27. … because good research needs good data
Data management planning
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
28. … because good research needs good data
Data registries
• Findable, citable data has value
• Important to link publications to data (and vice versa)
• Increases citations – of data & publication
• Increases reuse (hence value)
• But effects exist even without publication
• All benefit – researcher; institution; publisher
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
29. … because good research needs good data
Tools to track impact
http://total-impact.org/
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
30. … because good research needs good data
Activities, roles, requirements (1)
• Requirements gathering
• Identifying researchers’ data requirements
• Developing a shared understanding of what needs to be
done (e.g., identifying where data exist, its form and scale,
any existing retention requirements)
• Identifying good practice within the institution (and the
opposite)
• Methods: surveys, focus groups, case studies, joint R&D
projects, assessment tools (e.g. DAF)
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
31. … because good research needs good data
Activities, roles, requirements (2)
• Identifying motivations and benefits
• For researchers, support services, the institution
• Identifying risks
• Data loss (institution, research group, individual)
• Increased costs (lack of planning, service inefficiency, data
loss)
• Legal compliance (research funder, H&S, ethics, FoI)
• Reputation (institution, unit, individual)
• Identifying costs
• Keeping Research Data Safe (KRDS) toolkit
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
32. … because good research needs good data
Activities, roles, requirements (3)
• Assessing institutional preparedness
• Identifying institutional stakeholders, existing data support services,
gaps
• Benchmarking and planning for the future
• Skills audit
• CARDIO tool
• Policy development
• Policies – approval by senior management is just the start; policies
need to be embedded in research practice and responsive to
changing requirements
• Data management planning
• DMP online, DCC How-to Develop a Data Management Plan guide
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
33. … because good research needs good data
Activities, roles, requirements (4)
• Implementation and service development
• Integrating where possible with existing services, e.g. IR,
CRIS, VRE, HPC, cloud services, social media, etc.
• Appraisal, deciding what needs to be kept and for how long
• Storage choices – no one-size-fits-all solution, e.g. Bristol’s
BluePeta petascale storage facility, Bath’s X-Drive approach,
cloud approaches
• Data documentation and metadata – layered approaches:
top-level discovery (core metadata, collection/experiment-
level?), role of standards like DCMI, CERIF, DDI, etc.
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
34. … because good research needs good data
Activities, roles, requirements (5)
• Data issues:
• Appraisal: selection criteria, retention periods (who decides?)
• DCC How to appraise and select research data for
curation guide
• Documentation: metadata, schema, semantics
• Formats: proprietary formats, community standards, etc.
• Provenance and authenticity
• Citation (assignment of persistent IDs?)
• Access (embargo policies?)
• Licensing
• DCC How to license research data guide
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
35. … because good research needs good data
Things to do …
• Create policy – collaborate with others
• Growing number of policies being published (EPSRC,
Wellcome Trust)
• Build on existing digital services
• Examples: storage, data registry
• Learn about audit tools (DCC & others)
• Learn about data & sources
• Re-skill subject librarians
• Bridge between publishers & researchers
Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
36. What data to keep
… because good research needs good data
DCC resources
http://www.dcc.ac.uk/resources Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012
37. … because good research needs good data
Thank-you. Any questions?
Michael Day
Digital Curation Centre
UKOLN, University of Bath
m.day@ukoln.ac.uk
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland
License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or,
(b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Funded by:
Demystifying Research Data, JIBS/RLUK event, SOAS, London, 17 July 2012