1. Research data management: from
policy to practice with DMP Online
Martin Donnelly Sarah Jones
Digital Curation Centre Digital Curation Centre
University of Edinburgh University of Glasgow
Future Perfect 2012: Digital Preservation by Design
Te Papa Tongarewa, Wellington, New Zealand
26 – 27 March 2012
2. Running order (c. 25 mins)
1. Introduction to the DCC & research data management
2. Data-related policies in the UK Sarah
3. The DCC & data management planning
4. DMP Online v3.0
5. Connections and collaborations
6. Putting it into practice (UMF work and other things) Martin
7. Summary / conclusion
3. 1. The Digital Curation Centre
- Founded in 2004
- Three partners: Edinburgh, Glasgow and Bath
- Primary funder is JISC
Helping to build capacity, capability and skills in
data management and curation across the UK’s
higher education research community
- DCC Phase 3 Business Plan
4. What does the DCC do?
• Develop tools
– CARDIO, DAF, DRAMBORA, DMP Online
• Offer guidance
– helpdesk, briefing papers, how-to guides
• Run training & events
– DC101, roadshow, RDMF, IDCC
• Support the JISC
– esp. the Managing Research Data programmes
5. What is Research Data Management?
“the active management and
Manage
appraisal of data over the
lifecycle of scholarly and
scientific interest”
Share Data management is part of
good research practice
6. How does RDM affect preservation?
The costs of ingest – receiving data, preparing it for long-term
storage, and incorporating it into the digital archive – receives
the largest allocation of resources.
- Keeping Research Data Safe 2
7. 2. Data-related policies in the UK
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
8. RCUK Common Principles
• Publicly funded research data are a public good, produced in the public interest,
which should be made openly available with as few restrictions as possible in a
timely and responsible manner that does not harm intellectual property.
• Institutional and project specific data management policies and plans should be in
accordance with relevant standards and community best practice. Data with
acknowledged long-term value should be preserved and remain accessible and
usable for future research.
• To enable research data to be discoverable and effectively re-used by others,
sufficient metadata should be recorded and made openly available ....
7 principles agreed by all the UK
research councils in May 2011
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
9. UK research funder expectations
• timely release of data
– once patents are filed or on (acceptance for) publication
• open data sharing
– minimal or no restrictions
– deposit in data centres, structured databases, data enclave
• preservation of data
– most funders state expect 5-10+ years
• submission of data management and sharing plans…
10. 3. The DCC and DMP
We’ve responded to requirements by offering support
Analysed
requirements
Developed a
Checklist
Provided tools
& guidance
Links to all DMP resources via http://www.dcc.ac.uk/resources/data-management-plans
11. What is a DMP?
UK research funders typically ask for:
• A short statement/plan submitted in grant applications
• An outline of what you will create/collect, methods,
standards, data management and long-term plans
• How and why – justify your decisions and any limits
12. Common DMP questions
• What data will be created (format, types) and how?
• How will the data be documented and described?
• How will you manage ethics and Intellectual Property?
• What are the plans for data sharing and access?
• What is the strategy for long-term preservation?
13. DCC Checklist Coverage
§1: Introduction and Context
§2: Data Types, Formats, Standards and Capture
Methods
§3: Ethics and Intellectual Property
§4: Access, Data Sharing and Re-use
§5: Short-Term Storage and Data Management
§6: Deposit and Long-Term Preservation
§7: Resourcing
Checklist for a Data Management
§8: Adherence and Review Plan v3.0 (Donnelly and Jones,
March 2011)
§9: Agreement/Ratification by Stakeholders
§10: Annexes
http://www.dcc.ac.uk/resources/data-management-plans
14. DMP-related resources
– “Dealing with Data” (Lyon, 2008)
– Analysis of Funder Policies (Jones, 2009)
– Checklist for a Data Management Plan
(Donnelly and Jones, 2009)
– “How to Develop a Data Management and
Sharing Plan” (Jones, 2011) Edinburgh:
Digital Curation Centre
– “Data Management Plans and Planning”
(Donnelly, 2012) in Pryor (ed.) Managing
Research Data, London: Facet
Links to all DCC resources via http://www.dcc.ac.uk/resources/data-management-plans
15. Key things to remember
All research projects are different
The DMP will depend upon the nature of
the research AND the context (funder,
domain, institution(s) etc)
DMPs are useful communication tools
16. Not a UK phenomenon
Read about the international policy and DMP landscape in:
“Research data policies: “Data Management
principles, requirements Plans and Planning”
and trends” (Jones, (Donnelly, 2012) in
2012) in Pryor (ed.) Pryor (ed.) Managing
Managing Research Research Data,
Data, London: Facet London: Facet
18. What does do?
A web-based tool that enables users to...
i. Create, store and update multiple versions of Data
Management Plans across the research lifecycle
ii. Meet a variety of specific data-related
requirements (from funders, institutions, publishers,
etc.)
iii. Get tailored guidance on best practice and helpful
contacts, at the point of need
iv. Customise export are share DMPs in a variety of
formats in order to facilitate communications within
and beyond research projects
* N.B. The templates have varying degrees of endorsement from funders,
stakeholder communities, etc. More on this shortly…
19. Technologies involved (v3.0)
– Ruby on Rails (v3.1.3)
– JavaScript (jQuery v1.7.1)
– MySQL database (v5+)
– Hosting: University of Edinburgh Information Services
Virtual Hosting (13 managed servers across 2 sites)
– Authentication: registered users with passwords encrypted
in DB (we are also testing Shibboleth for integration with UK
Access Management Federation for Education and Research)
– Various export formats (DOCX, PDF, XML, CSV, etc)
20. DMP Online v3.0: Spring 2012
- Improved user interface, inc. customisable
institutional versions
- New features
- Overlaying multiple templates for ‘hybrid’ DMPs
- Template phases (e.g. pre- / during / post-project)
- Granular read / write / share permissions
- API for systems interoperability (e.g. this project)
- Shibboleth authentication
- Multilingual support / boilerplate text
- Endorsement from funders
21. Collaborations
- Generic data management guidance ( in
conjunction with )
- Funder-specific guidance developed in collaboration
with the funders themselves
- Institution-specific guidance developed with key
institutional contacts
- Discipline-specific guidance developed and deployed
with JISC MRD projects (e.g. DMT Psych at York)
- Joint training programmes organised and delivered
by DCC and UKDA
- Provided advice to US consortium
22. Templates: Stakeholder Liaison (i)
RCUK funders Status
Arts and Humanities Research Council (AHRC) Discussions beginning
Biotechnology and Biological Sciences Research Council Discussions ongoing
(BBSRC)
Engineering and Physical Sciences Research Council No explicit data management plan requirements: DCC
(EPSRC) referenced in roadmap requirements
Economic and Social Research Council (ESRC) Template and guidance developed in collaboration with
ESRC and ESDS. Funder’s online guidance points
applicants towards tool.
Medical Research Council (MRC) Template in preparation through collaboration with
funder
NERC (Natural Environment Research Council) Discussions ongoing
Science and Technology Facilities Council (STFC) DCC resources referenced in data requirements
Other funders Status
The Wellcome Trust Template and guidance endorsed by funder
National Science Foundation (US) Template developed by Sherry Lake, University of
Virginia
23. Templates: Stakeholder Liaison (ii)
Disciplinary templates Status
History Developed in conjunction with University of Hull and University of
Hertfordshire
Psychology Developed by DMT Psych project, led by University of York
Mechanical Engineering Developed as part of REDm-MED project, led by University of Bath
Health sciences Developed by DATUM for Health project, led by University of Northumbria
Spatial information (INSPIRE) Developed in conjunction with EDINA (UK national data centre) and
trialled with Freshwater Biological Association
Institutional templates Status
University of Northampton Developed in collaboration with Information Services department
More institutional and subject-based templates are
being developed through the JISC RDM projects
and UMF institutional engagements…
24. Institutional Engagements:
Putting it into practice
- Working with eighteen institutions over
approximately 18 months to improve data
management capabilities
- A broad variety of institutional types and sizes, from
research intensive ancient universities, to new
universities and small specialist institutions (e.g. art
colleges)
- Institutions select from a ‘menu’ of tools and
services, e.g. (next slide)
25. The Menu
Components of a Data DCC Tools DCC Services
Management Strategy
(Research and Admin)
Policy Data Asset Framework Policy development
(DAF)
Planning DMP Online Strategy development
Advocacy CARDIO Training
Tools DRAMBORA Workflow assessment
Training Costing
Institutional data catalogues
(discovery)
26. Workflow connections
DMP Online can also be used in conjunction
with other tools that support the data
management/curation lifecycle, e.g.…
- DAF (Data Asset Framework)
- DRAMBORA (Digital Repository Audit Method
Based On Risk Assessment)
- CARDIO (Collaborative Assessment of
Research Data Infrastructure and Objectives)
Also non-DCC tools:
- LIFE
- Planets tools
- and more
27. How to connect: six export formats
For human readership… For machine readership…
- Pleasant formatting - Facilitates quick public
sharing
- Editable. Can be used - Compatible with API
in conjunction with for linking with other
(e.g. MS Sharepoint) systems
- Removes all formatting - Minimal formatting
28. External connections
Systems Standards / protocols
– CRIS / admin systems – CERIF*
– RCUK Je-S system
– Institutional Repositories – SWORD2
– DDI repository – DDI*
– DMP Tool (US)
– Other instances of DMP – RDF (? - TBC)
Online via federated
model (? -TBC)
* via RESTful API
29. Research
Support Office Data Library / Repository / Archive
Researcher(s)
DATA
MANAGEMENT
PLAN
UNRULY
DATA
Computing Faculty Ethics
Support Etc...
Committee
30. To sum...
All of our DMP-related resources available online via:
www.dcc.ac.uk/dmponline/
31. Thank you
Martin Donnelly Sarah Jones
Digital Curation Centre Digital Curation Centre
University of Edinburgh University of Glasgow
martin.donnelly@ed.ac.uk sarah.jones@glasgow.ac.uk
Twitter: @mkdDCC Twitter: @sjDCC
Check out DCC at: www.dcc.ac.uk or follow us on twitter @digitalcuration and #ukdcc
Image credits:
Slide 1 - http://upload.wikimedia.org/wikipedia/commons/8/88/LernaeanHydraRephael.jpg
Slide 5 - http://www.dcc.ac.uk/resources/curation-lifecycle-model
Slide 6 (The Scream) - http://www.flickr.com/photos/terryfreedman/6548040049
Slide 6 (OAIS) - http://public.ccsds.org/publications/archive/650x0b1.pdf
This work is licensed under the Creative Commons
Slide 29 - http://en.wikipedia.org/wiki/File:Hercules_slaying_the_Hydra.jpg
Attribution 2.5 UK: Scotland License.
Slide 30 - http://www.treehugger.com/picture-is-worth-sum-car-parts.jpg
Notes de l'éditeur
Good afternoon. We are Martin Donnelly and Sarah Jones of the Digital Curation Centre, at the Universities of Edinburgh and Glasgow respectively. We’ll be talking today about the journey from research data management policy to good practice, and how the DCC’s resources, notably the DMP Online tool, can support this journey.
Sarah will give an introduction to the DCC and our interest in research data management, before giving an overview of the policy situation in the UK and how we got involved in data management planning.Martin will then take over, talking in more detail about the DMP Online tool, the various collaborations we’ve formed through this work, and an overview of the major job of work that we’re both currently involved in, namely the DCC’s set of institutional engagements.
The UK Digital Curation Centre was established in 2004. We’re based across three universities, and have a remit to support UK Higher Education as a whole.Our mission has changed over time from a focus on digital curation and preservation, working largely with archives and repositories, to research data management in universities.
The DCC has four main strands of activity:We develop tools to help organisations assess their infrastructure & capabilities or to undertake specific tasks e.g. writing DMPs with DMP OnlineWe run a helpdesk, which is open to all, and provide guidance. How To guides are a new range of pragmatic, practical advice.We run training and community building events. The roadshows help institutions develop research data management strategiesWe support JISC by co-ordinating events, working with projects and synthesising/disseminating findings.
The DCC developed the curation lifecycle model to explain the range of activities involved in creating, preserving and sharing digital content.In RDM terms ‘curation’ is simply managing & sharing data. The DCC argues that this is just part of good research practice.
How datasets are created and managed in the short-term affects how much work it is to ingest and preserve them. The transition isn’t always easy, which is why it’s useful to work with researchers early on to support them to make informed decisions about how to create and manage their data.The KRDS costs and benefits studies found that ingest is by far and away the most resource intensive activity.
Last year the 7 UK Research Councils released common principles to harmonise their data policies.These push for open data, acknowledge the importance of policies and planning, and cover various aspects on curating data (including meeting costs).
Basic expectations across the board are that:Data are released as soon as possibleData are shared openly wherever possibleData are preserved for 10+ yearsDMPs are submitted that outline plans for data management and sharing
The DCC has responded to these requirements by providing lots of support on data management planningLiz Lyon first called for plans in 2007 in a recommendation in the Dealing with Data ReportWe have since analysed funders requirements and put together a checklist for a Data Management PlanThe Checklist is the underlying intellectual framework in DMP Online, the flagship of the DCC’s tools and resourcesWe also provide guidance documents and have custom guidance (disciplinary & institutional) built into DMP Online
A DMP is a basic statement of how you will create, manage, share and preserve your dataFunders expect the decisions to be justified, particularly where it’s not in line with their policy (e.g. limits on data sharing)
The main questions across the board cover:Data creationMetadata and documentationEthical and legal issuesData sharing Preservation
You see the common questions come through in the main sections of the DCC ChecklistWe also include administrative sections (intro, review, ratification) so you can ensure co-ordination and commitment across all of the stakeholders involved in managing data.
So in summary, these are some of the key DMP-related resources.
The main things to remember about DMPs is that all research projects are different- the DMP will vary with context.Apart from a few very specialised areas like backup - there are no universal rights and wrongs.Research data management by nature involves multiple stakeholders, so planning is important as a communication mechanism.The process of producing a plan (i.e. engaging with others and deciding on the best way forward) is as important as the plan itself.
SJ > MDThese expectations and trends are not a UK phenomenon. Martin and I have contextualised the UK experience of data policies and planning, by reflecting on international initiatives in Managing Research DataThis is why DMP Online is relevant to international audiences, so I’ll let Martin tell you all about it.
Thanks SarahWe started developing DMP Online in 2009, and launched the first version in 2010. We’re now on to v3.0, which includes some great new features that we’re really excited about.
The DCC Checklist is by nature very long, and its length was felt to be off-putting to researchers. Most of them don’t want to deal with this stuff even at a basic level, and a long Checklist with over 100 questions was not going to enjoy a large takeup.No matter how many times we said “you don’t need to fill it all in, just the bits that are relevant to you at this time” the message wasn’t going to sink in, so we developed a fairly basic wizard style tool which asked a few questions about what stage your research was at, who your funder was, etc, and then pulled out only the most relevant questions from the Checklist to help you meet the pertinent requirements. So instead of seeing 115 questions, you might be presented with only 15 or 20. Much better.We then added functionalities like export and customisation, and some generic guidance to help with some of the more esoteric sections such as file format selection and metadata.
For those interested in such things, these are the technologies used in v3.0.
As I mentioned earlier, version 3 launched very recently, and has a number of great new features.The user interface has been tweaked to allow easier (one-click) access to most of the screens, and we’re investigating customised institutional versions with, among others, the University of Oxford.The tool now enables the application of multiple templates, so you can create a single DMP that satisfies your institution, your funder and your publisher at the same time. These templates can be phased more elegantly, so that you can ask (for example) a few questions at the application stage, more during the project’s lifetime, and then add even more detail when you’re close to completion.Users now have the option to make their plans more widely available. Authentication can be managed via the UK Federated Access Shibboleth mechanism, and we have coded the new system to enable easy translation into other languages, and to handle boilerplate text where this is thought to be beneficial.We have also been working behind the scenes to gain more official endorsement from some of the big funding councils, and this is starting to bear fruit.
So, in addition to the liaison with the funders, we’ve developed relationships with a variety of others. Our closest working relationship has probably been with the UK Data Archive, which is the designated place of long term deposit for the Economic and Social Research Council. Working with UKDA we have developed a data management planning template and guidance for ESRC applicants, and we also point to some UKDA guidance in the generic Checklist. We have also liaised with Wellcome Trust, the Medical Research Council and various other funders to develop dedicated DMP templates for them. Continuing in this vein, we’ve worked with disciplinary specialists and key institutional contacts to develop further DMP templates, and through the JISC Managing Research Data programmes we’ve contributed to a number of projects creating training materials around this area.Last but not least, we’ve shared experiences with a consortium of US universities – including the Universities of California, Virginia, and Illinois, and the Smithsonian Institution – which has helped them to shape their own DMP Tool.
These tables show the templates we’ve developed or are in the process of developing. I won’t go through them all now, but the slides will be available for later perusal.
And more templates are being developed all the time. If you’d like to talk about creating one for your institution or organisation, either catch me afterwards or drop me an email.
So that’s a pretty good high-level summary of what we’ve done in the data management planning area over the past four years or so.We’d like to end with a quick outline of the DCC’s institutional engagement programme, the major job of work that Sarah and I (and about a dozen other colleagues) are currently involved in. From last Autumn until next Spring – UK seasons, so the other way around for colleagues in New Zealand! – the DCC has been funded by the Higher Education Funding Council for England (HEFCE) to support eighteen HEIs in increasing their institutional data management capabilities. We’re working with a range of institutional types and sizes, from research intensive ancient universities, to new universities and small specialist institutions (e.g. art colleges). The way this works is we first of all make contact with someone already interested in this area, often in the Library, and through them we approach a senior academic, usually at Vice Principal level or equivalent, to make the case for working more concertedly in this area. Once an agreement is reached, the institution selects from a ‘menu’ of tools and services, e.g. (next slide)
Developing a Data Management StrategyDCC services to support aspects of the research and data management lifecycle, as given in Column 3, andThe tools to support different strands of this (some tools are simply utilised out of the box, others we can provide help and training with, and others – such as DMP Online – can be customised and tailored to match individual institutions’ requirements more closely.
Similarly, DMP Online can also be used in conjunction with other tools that support the data management/curation lifecycle, be these DCC tools or tools from other sources.
And at an information exchange level, here’s what we can do. Plans can be exported in a variety of formats, for human and/or machine readerships, and…
… the tool can link to these types of external systems using a variety of standards and protocols. Of course, this list is not exhaustive, and if you see an opportunity for linking DMP Online with other tools we might not have considered, let us know: the API will probably make it possible.
So in conclusion, we see the data management plan as a multi-purpose instrument – communication and context – and one that can bring together, if not level, the various stakeholder groups in the research data management endeavour.
Reversing the hydra metaphor somewhat, we hold that research is more than the sum of its parts, and when data management planning acts to facilitate communication for ensuring smooth and accurate interactions, it also serves as a way to bring it all together.