Testing tools and AI - ideas what to try with some tool examples
DMP & DMPonline
1. Funded by:
Data Management Plans
& DMPonline
EUDAT webinar, 12th
July 2013
Sarah Jones
Digital Curation Centre
sarah.jones@glasgow.ac.uk
Twitter: sjDCC
4. What if this was your desk?
•www.computerweekly.com
5. Why YOU need a Data
Management Plan
http://blogs.ch.cam.ac.uk/pmr/2011/08/01/why-you-need-a-data
What if this was your laptop?
6. Why manage your research data?
• To make your research easier!
• To stop yourself drowning in irrelevant stuff
• In case you need the data later
• To avoid accusations of fraud or bad science
• To share your data for others to use and learn from
• To get credit and increase your citations
8. Drivers for RDM
•
•Code of good research conduct
•Data should be preserved and
accessible for 10 years +
Data policies of UK funders and universities
www.dcc.ac.uk/resources/policy-and-legal
•Declaration on Access to Research
Data from Public Funding
Notion that data are a public good and
should be openly available
9. Expectations of public access
“Publicly funded research data are a public good,
produced in the public interest, which should be
made openly available with as few restrictions as
possible in a timely and responsible manner that
does not harm intellectual property.”
RCUK Common Principles on Data Policy
www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
12. Sharing data to advance research
www.nytimes.com/2010/08/13/health/research/13alzheimer.html?pagewanted=all&_r=0
“It was unbelievable. Its not science
the way most of us have practiced
in our careers. But we all realised
that we would never get biomarkers
unless all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
•... scientific breakthroughs
14. What is a DMP?
A short plan that outlines
• what data you will create and how
• how you will manage it (storage, back-up, access…)
• plans for data sharing and preservation
15. Why develop a DMP?
DMPs are often submitted with grant applications, but
are useful whenever you are creating data to:
• Make informed decisions to anticipate and avoid problems
• Avoid duplication, data loss and security breaches
• Develop procedures early on for consistency
• Ensure data are accurate, complete, reliable and secure
• Save time and effort – make your life easier!
16. Which UK funders require a DMP?
•www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies
18. What do research funders want?
• A brief plan submitted in grant applications
• 1-3 sides of A4 as attachment or a section in Je-S form
• Typically a prose statement covering suggested themes
• An outline of data management and sharing plans,
justifying decisions and any limitations
19. Five common themes
1. Description of data to be collected / created
(i.e. content, type, format, volume...)
2. Standards / methodologies for data collection &
management
3. Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. embargoes,
confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
20. Tips on writing DMPs
• Keep it simple, short and specific
• Seek advice - consult and collaborate
• Base plans on available skills and support
• Make sure implementation is feasible
• Justify any resources or restrictions needed
21. A useful framework to get started
•Think about why
the questions are
being asked
•Look at examples
to get an idea of
what to include
•www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/framework.html
22. Example plans
• Technical plan submitted to AHRC by Bristol Uni
http://data.bris.ac.uk/files/2013/02/data.bris-AHRC-Technical-Plan-v21.pdf
• Rural Economy & Land Use (RELU) programme
examples
http://relu.data-archive.ac.uk/data-sharing/planning/examples
• UCSD example DMPs (20+ scientific plans for NSF)
http://rci.ucsd.edu/dmp/examples.html
• My DMP – a satire (what not to write!)
http://ivory.idyll.org/blog/data-management.html
23. A satirical response – what not to do
“I will store all data on at least one, and possibly up to 50, hard
drives in my lab. The directory structure will be custom, not
self-explanatory, and in no way documented or described.
Students working with the data will be encouraged to make
their own copies and modify them as they please, in order to
ensure that no one can ever figure out what the actual real
raw data is. Backups will rarely, if ever, be done.”
My Data Management Plan – a satire
C. Titus Brown
24. Help from the DCC
•https://dmponline.dcc.ac.uk
•www.dcc.ac.uk/resources/
•how-guides/develop-data-plan
26. A web-based tool to help researchers write data
management plans, based on the DCC Checklist for a DMP
A short history
•Launched in April 2010 at the Jisc conference
•Released v.2 in March 2011 with extra functionality
•Released v.3 in April 2012 with revisions in light of the
DMPTool and work from the Jisc MRD programme
•v.4 due in Autumn 2013 after a detailed evaluation
What is DMPonline?
27. • Developed funder-specific guidance in collaboration with funders
• Developed institutional templates (questions and locally-specific
guidance) with key contacts in universities
• Developed and deployed discipline-specific guidance with Jisc
MRD projects (e.g. DMTPsych at York)
• Provide ongoing advice to the DMPTool consortium
• The Australian National Data Service (ANDS) is trialing the tool
and use of DMPonline has been mooted for Horizon 2020
Collaborations as DMPonline evolved
28. Main features in DMPonline
•Templates for different requirements (funder or institution)
•Tailored guidance (funder, institutional, discipline-specific etc)
•Ability to provide examples and boilerplate text
•Supports multiple phases (e.g. pre- / during / post-project)
•Granular read / write / share permissions
•Customised exports to a variety of formats
•API for systems interoperability
•Shibboleth authentication
29. How does DMPonline work?
Create a plan
based on
relevant
funder /
institutional
templates...
...and then
answer the
questions
using the
tailored
guidance
provided
30. Evaluation of DMPonline
POSITIVE COMMENTS
•Good to have an online tool
•Technically well-coded
•Liked ‘sharing’ feature
•Demand for customisations
•Has provided an impetus
•Desire to feed into plans –
DMPonline community
CONCERNS RAISED
•Too detailed and in-depth
•Output too long to submit
•Difficulty understanding the tool
and concept of mappings
•Small teething-troubles with UI
design & workflows
what’s the minimum you
can actually get away with
what’s the minimum you
can actually get away with
Future plans for DMPonline:
www.dcc.ac.uk/news/future-plans-dmponline
31. Revising how the Checklist is used
Each funder question
mapped to multiple DCC
Checklist questions
Funder or institutional questions
asked & answered directly.
The new list of themes will match
and present relevant guidance
from funder, unis and disciplines.
This change allows
users to delve into
guidance as needed
rather than always
breaking questions
down for them
32. Redevelopment timeframe
• Outsourced UI work complete and live in v.3
• Shortened version of the Checklist published
www.dcc.ac.uk/news/new-checklist-data-management-plan
• Use cases and database re-design complete
• Expect v.4 beta by August for testing
• Roll-out of v.4 from September 2013
33. Institutions can customise DMPonline
•Select / write
desired questions
•Add your logo, colours, URL…
•Profile local support via
custom guidance and
boilerplate text
34. Institutional versions of DMPonline
• University of Northampton
• Queen Mary University of London
• Oxford Brookes University
• Goldsmiths University of London
• University of the Arts
• University of Newcastle
• University of Oxford
• University of Hull
• University of Edinburgh
• + several more...
•in development
35. Want to use DMPonline?
• Register at:
https://dmponline.dcc.ac.uk/users/sign_up
• Request new features on GitHub:
https://github.com/DigitalCurationCentre/DMPOnline
• Contact us about collaboration on:
dmponline@dcc.ac.uk
36. Thanks – any questions?
DCC guidance, tools and case studies:
www.dcc.ac.uk/resources
Follow us on twitter:
@digitalcuration and #ukdcc
Notes de l'éditeur
There are lots of benefits to you as an individual from managing your data
There are also overarching drivers (e.g. policies and government agendas) pushing for data management and sharing
2012-02-07
2012-02-07
Research communities are also pushing for data sharing to answer grand challenges / make scientific breakthroughs
Some funders ask for data access/dissemination plans (e.g. a preliminary plan for managing ‘foreground’ in the case of the EC), or simply encourage you to describe plans for certain aspects of creating/using/managing data. Gareth Knight’s analysis for LSHTM highlighted the additional health funders listed here as having some form of requirement - http://researchonline.lshtm.ac.uk/208596/1/Funder_Requirements_Analysis.pdf For North American requirements see the DMPTool - https://dmp.cdlib.org/pages/funder_requirements
Although this is very tongue in cheek, it gives some useful pointers of what to consider / cover in a light-hearted way
The DCC has produced a How to guide on writing DMPs and developed a tool to help
DMPonline has evolved greatly over the past few years, initially in terms of adding extra features and functionality requested by users, but subsequently in more fundamental ways in light of how the landscape is changing. When we started, only a handful of research funders asked for DMPs. The requirements have since increased and these are regularly updated. Universities often ask for DMPs now too, and they want to provide tailored guidance, specific to their local context. Discipline-specific guidance is also emerging. The tool is evolving to allow us to combine the different funder, institutional and disciplinary contexts to present users with the relevant questions and guidance based on what position they’re in (i.e. if they’re applying for funding but their university also asks for a DMP and offers local guidance, we’ll pull all these things together and present them to the user)
There has been lots of input from different communities over the years, which has shaped how the tool has evolved.
We undertook a major evaluation in late 2012 as user needs are rapidly evolving. We’re no longer in a simple position of writing DMPs based solely on funder requirements. We need to balance the needs of institutions and disciplines too, and combine all of these for the users. Some of the major concerns raised are due to how the Checklist is used in the tool. It has been used to associate funder questions with institutional & disciplinary guidance (both are matched to DCC questions which are presented to users) but this can make the plans overly detailed and complicated.
Previously the funder questions were mapped to multiple DCC questions so that we could pull in relevant guidance associated to these. It meant that users were sometimes asked a few questions instead of one and some found the way they were broken down a bit repetitive and spoon-fed. In v.4 users will be asked and answer questions from their funder or institution directly. This will keep the plans short and streamlined. Any relevant guidance from their funder, institution and discipline will be presented in a concertina dropdown so they can delve into it as needed. A new list of themes (rather than the Checklist questions) will be used to map/associate questions with relevant guidance.
One of the most popular features of DMPonline is how it can be customised to suit your institution.