1. Facilitate Open Science Training for European Research
Martin Donnelly
Digital Curation Centre
University of Edinburgh
“The Horizon 2020 Open Research Data Pilot”
OpenAIRE webinar, 9 June 2015
2. OVERVIEW
1. Open Access, Open Data, Open Science
2. Open Access to Research Publications
3. What is Research Data Management?
4. The H2020 Open Data Pilot
5. Useful resources
6. About the FOSTER project
FacilitateOpenScienceTrainingforEuropeanResearch
“The Horizon 2020 Open Research Data Pilot”
3. 1. OPEN ACCESS, OPEN DATA, OPEN SCIENCE
• Open Science is situated within a context of ever greater
transparency, accessibility and accountability
• The impetus for Openness in research comes from two
directions:
• Ground-up – Open Access (OA) began in the High Energy Physics
research community, which saw benefit in not waiting for publication
before sharing research results (and data/code)
• Top-down – Government/funder support, increasing public and
commercial engagement with research
• The main goals of these developments are to lower barriers
to accessing the outputs of publicly funded research (or
‘science’ for short), to speed up the research process, and to
strengthen the quality, integrity and longevity of the
scholarly record
4. Publications and data are linked
• The Internet lowered the physical barriers to accessing knowledge, but financial
barriers remained – indeed, the cost of online journals tended to increase much
faster than inflation, and scholars/libraries faced a cost crisis
• Open Access originated in the 1980s with free-to-access Listserv journals, but it
really took off with the popularisation of the Internet in the mid-1990s, and the
subsequent boom in online journals
• As Open Access to publications became normal (if not ubiquitous), the scholarly
community turned its attention to the data which underpins the research outputs,
and eventually to consider it a first-class output in its own right. The development of
the OA and research data management (RDM) agendas are closely linked as part of a
broader trend in research, sometimes termed ‘Open Science’ or ‘Open Research’
• “The European Commission is now moving beyond open access towards the more inclusive
area of open science. Elements of open science will gradually feed into the shaping of a
policy for Responsible Research and Innovation and will contribute to the realisation of the
European Research Area and the Innovation Union, the two main flagship initiatives for
research and innovation”
http://ec.europa.eu/research/swafs/index.cfm?pg=policy&lib=science
5. Timeline: Open Access and Data Sharing
• 1987: New Horizons in Adult Education launched by the Syracuse University
Kellogg Project. (An early free online peer-reviewed journal.)
• 1991: The “Bromley Principles” Regarding Full and Open Access to “Global
Change” Data, in Policy Statements on Data Management for Global Change
Research, U.S. Office of Science and Technology Policy
• 2001: The term “Open Access” (OA), meaning the free online availability of
research literature, is first coined at an Open Society sponsored meeting in
Budapest, Hungary.
• 2004: Ministerial representatives from 34 nations to the Organisation for
Economic Co-operation and Development (OECD) issue the Declaration on Access
to Research Data From Public Funding.
• 2006: The Scientific Council of the European Research Council (ERC) pledges to
adopt an OA mandate for ERC-funded research “as soon as pertinent
repositories become operational”.
• 2012: European Commission recognises research data is as important as
publications. Announces in July 2012 that it would experiment with open access
to research data (see IP/12/790) http://europa.eu/rapid/press-release_IP-12-
790_en.htm
(Derived from, inter alia, Peter Suber (2009) “Timeline of the open access
movement”, http://www.earlham.edu/~peters/fos/timeline.htm )
6. 2. OPEN ACCESS TO RESEARCH PUBLICATIONS
• OA publication is past the ‘tipping point’ in several fields (e.g.
biology, biomedical research, mathematics and general science &
technology) whereas the social sciences, humanities, applied
sciences, engineering and technology are the least engaged.
(Archambault et al., 2013)
• The EC sees a real economic benefit to OA by supporting SMEs and
NGOs who can’t always afford subscriptions to the latest research
• FP7 included an Open Access pilot, which is now an across-the-
board requirement in Horizon 2020. A pilot for open data has also
been introduced with an intention to develop policy in the same
way…
7. 3. WHAT IS RDM?
“the active management
and appraisal of data over
the lifecycle of scholarly
and scientific interest”
What sorts of activities?
- Planning and describing data-
related work before it takes place
- Documenting your data so that
others can find and understand it
- Storing it safely during the project
- Depositing it in a trusted archive
at the end of the project
- Linking publications to the
datasets that underpin them
- Reusing or building upon existing
datasets (where appropriate)
Data management is a part of
good research practice.
- RCUK Policy and Code of Conduct on the
Governance of Good Research Conduct
8. 3a. Benefits of data sharing & publication
• SPEED: The research process becomes faster
• EFFICIENCY: Data collection can be funded once, and
used many times for a variety of purposes
• ACCESSIBILITY: Interested third parties can (where
appropriate) access and build upon publicly-funded
research resources with minimal barriers to access
• IMPACT and LONGEVITY: Open publications and data
receive more citations, over longer period
• TRANSPARENCY and QUALITY: The evidence that
underpins research can be made open for anyone to
scrutinise, and attempt to replicate findings. This leads
to a more robust scholarly record
9. Without intervention, data + time = no data
Vines et al. “examined the availability of data from 516 studies between 2 and 22 years old”
- The odds of a data set being reported as extant fell by 17% per year
- Broken e-mails and obsolete storage devices were the main obstacles to data sharing
- Policies mandating data archiving at publication are clearly needed
“The current system of leaving data with authors means that almost all of it is lost over
time, unavailable for validation of the original results or to use for entirely new purposes”
according to Timothy Vines, one of the researchers. This underscores the need for
intentional management of data from all disciplines and opened our conversation on
potential roles for librarians in this arena. (“80 Percent of Scientific Data Gone in 20
Years” HNGN, Dec. 20, 2013, http://www.hngn.com/articles/20083/20131220/80-percent-
of-scientific-data-gone-in-20-years.htm.)
Vines et al., The Availability of Research Data Declines Rapidly with Article Age,
Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
10. 3b. Data management plans and planning
• Data management planning (DMP) underpins and pulls
together different strands of research data management
(RDM) activities
• DMP is the process of planning, describing and
communicating the activities carried out during the
research lifecycle in order to…
• Keep sensitive data safe
• Maximise data’s re-use potential
• Support longer-term preservation
• Research funders (and other bodies) often ask for a short
statement/plan to be submitted as part of project
documentation. HEIs are increasingly asking their
researchers to do this too…
11. What does a data management plan look like?
A brief statement defining…
• how data will be captured/created
• how it will be documented
• who will be able to access it
• where it will be stored
• how it will be backed up, and
• whether (and how) it will be shared and preserved
long-term
• etc
DMPs are often submitted as part of funding applications, but will be
useful whenever researchers are creating (or reusing) data, especially
where the research involves multiple partners, countries, etc…
12. Roles and responsibilities
Open Science encourages – and indeed requires – heterogeneous
stakeholder groups to work together for a shared societal goal
It’s worth bearing in mind that RDM and DMP are similarly hybrid
activities, involving multiple stakeholder types…
• The principal investigator (usually ultimately responsible for data)
• Research assistants (may be more involved in day-to-day data
management)
• The institution’s funding office (may have a compliance role)
• Library/IT/Legal (The library may issue PIDs, or liaise with an external
service who do this, e.g. DataCite.)
• Partners based in other institutions
• Commercial partners
• Publishers
• etc
13. Benefits of a planned approach
• It is intuitive that planned activities stand a better chance of
meeting their goals than unplanned ones. The process of
planning is also a process of communication, increasingly
important in interdisciplinary / multi-partner research.
Collaboration will be more harmonious if project partners (in
industry, other universities, other countries…) are in accord
• In terms of data security, if there are good reasons not to
publish/share data, in whole or in part, you will be on more
solid ground if you flag these up early in the process
• DMP also provides an ideal opportunity to engender good
practice with regard to (e.g.) file formats, metadata
standards, storage and risk management practices, leading to
greater longevity of data, and higher quality standards…
14. 4. THE H2020 OPEN DATA PILOT
• The Horizon 2020 Open Research Data pilot
covers “Innovation actions” and “Research
and Innovation actions”
• It involves three iterations of Data
Management Plan (DMP)
• 6 months after start of project, mid-project
review, end-of-project (final review)
• DMP contents
• Data types; Standards used; Sharing/making
available; Curation and preservation
• There are opt-out conditions. A detailed
description and scope of the Open Research
Data Pilot requirements is provided on the
Participants’ Portal…
15. Open Research Data Pilot: specifics (i)
AIM
The Open Research Data Pilot aims to improve and maximise access to and re-use of research
data generated by projects. It will be monitored throughout Horizon 2020 with a view to
further developing EC policy on open research.
SCOPE
For the 2014-2015 Work Programme, the areas of Horizon 2020 participating in the Open
Research Data Pilot are:
• Future and Emerging Technologies; Research infrastructures; part e-Infrastructures; Leadership in
enabling and industrial technologies; Information and Communication Technologies; Societal
Challenge: 'Secure, Clean and Efficient Energy’; part Smart cities and communities; Societal
Challenge: 'Climate Action, Environment, Resource Efficiency and Raw materials' – except raw
materials; Societal Challenge: 'Europe in a changing world – inclusive, innovative and reflective
Societies’; Science with and for Society
This corresponds to about €3 billion or 20% of the overall Horizon 2020 budget in 2014-2015.
COVERAGE
The Open Research Data Pilot applies to two types of data:
1. the data, including associated metadata, needed to validate the results presented in scientific
publications as soon as possible;
2. other data, including associated metadata, as specified and within the deadlines laid down in
the data management plan.
16. Open Research Data Pilot: specifics (ii)
STEP 1
• The data should be deposited, preferably in a dedicated research data
repository. These may be subject-based/thematic, institutional or centralised.
• EC suggests the Registry of Research Data Repositories (www.re3data.org) for
researchers looking to identify an appropriate repository
• Open Access Infrastructure for Research in Europe (OpenAIRE) will also become
an entry point for linking publications to data and vice versa, through the use of
Persistent Identifiers (PIDs), a service currently provided by the Zenodo data
repository
STEP 2
• So far as possible, projects must then take measures to enable third parties to
access, mine, exploit, reproduce and disseminate (free of charge for any user)
this research data.
• EC suggests attaching Creative Commons Licence (CC-BY or CC0) to the data
deposited (http://creativecommons.org/licenses/,
http://creativecommons.org/about/cc0).
• At the same time, projects should provide information via the chosen repository
about tools and instruments at the disposal of the beneficiaries and necessary
for validating the results, for instance specialised software or software code,
algorithms, analysis protocols, etc. Where possible, they should provide the
tools and instruments themselves.
17. Open Research Data Pilot: specifics (iii)
COSTS
Costs relating to the implementation of the pilot will be eligible. Specific
technical and professional support services will also be provided (e-
Infrastructures WP), e.g. EUDAT and OpenAIRE, alongside support measures
such as FOSTER.
OPT-OUTS
Opt outs are possible, either totally or partially. Projects may opt out of the
Pilot at any stage, for a variety of reasons, e.g.
• if participation in the Pilot on Open Research Data is incompatible with the
Horizon 2020 obligation to protect results if they can reasonably be
expected to be commercially or industrially exploited;
• confidentiality (e.g. security issues, protection of personal data);
• if participation in the Pilot on Open Research Data would jeopardise the
achievement of the main aim of the action;
• if the project will not generate / collect any research data;
• if there are other legitimate reasons to not take part in the Pilot (to be
declared at proposal stage)
18. 5. USEFUL RESOURCES (DCC)
• Book chapter
• Donnelly, M. (2012) “Data Management Plans and
Planning”, in Pryor (ed.) Managing Research
Data, London: Facet
• Guidance, e.g. “How-To Develop a Data
Management and Sharing Plan”
• DCC Checklist for a Data Management Plan:
http://www.dcc.ac.uk/resources/data-
management-plans/checklist
• Links to all DCC DMP resources via
http://www.dcc.ac.uk/resources/data-
management-plans
• DMPonline: https://dmponline.dcc.ac.uk/
19. • Helps researchers write DMPs
• Provides funder questions and guidance
• Provides help from universities
• Examples and suggested answers
• Free to use
• Mature (v1 launched April 2010)
• Code is Open Source (on GitHub)
https://dmponline.dcc.ac.uk
DMPonline: overview
20. Non-DCC tools and resources
• Many universities have research data
services, including excellent resources
• UK Data Archive guidance
• DMPTool
• Research Data Netherlands resources
• http://datasupport.researchdata.nl/en/
• European Union resources, e.g. OpenAIRE
21. 6. ABOUT THE FOSTER PROJECT
Facilitate Open Science Training for European Research
22. FacilitateOpenScienceTrainingforEuropeanResearch
OBJECTIVES
• To support different stakeholders, especially young researchers, in adopting
open access in the context of the European Research Area (ERA) and in
complying with the open access policies and rules of participation set out for
Horizon 2020
• To integrate open access principles and practice in the current research
workflow by targeting the young researcher training environment
• To strengthen institutional training capacity to foster compliance with the
open access policies of the ERA and Horizon 2020 (beyond the FOSTER project)
• To facilitate the adoption, reinforcement and implementation of open access
policies from other European funders, in line with the EC’s recommendation, in
partnership with PASTEUR4OA project
23. METHODS
• Identifying already existing content that can be reused in the
context of the training activities and repackaging, reformatting
them to be used within FOSTER, and developing/creating/enhancing
contents as required
• Developing the FOSTER Portal to support e-learning, blended
learning, self-learning, dissemination of training materials/contents
and a Helpdesk
• Delivery of face-to-face training, especially training
trainers/multipliers who can deliver further training and
dissemination activities, within institutions, nations or disciplinary
communities
FacilitateOpenScienceTrainingforEuropeanResearch
24. THANK YOU… any questions?
Martin Donnelly
Digital Curation Centre
University of Edinburgh
martin.donnelly@ed.ac.uk
Twitter: @mkdDCC
www.dcc.ac.uk
www.fosteropenscience.eu
This work is licensed under the
Creative Commons Attribution
2.5 UK: Scotland License.
Editor's Notes
Context: Open Knowledge Foundation, Creative Commons, G8 statement
Open Science principles are an essential part of knowledge creation and sharing and innovation. They directly support researchers’ need for greater impact, optimum dissemination of research, while also enabling the engagement of citizen scientists and society at large on societal challenges. FOSTER aims to set in place sustainable mechanisms for EU researchers to integrate Open Science in their daily workflow, supporting researchers to optimizing their research visibility and impact and to facilitate the adoption of EU open access policies.
Context: Open Knowledge Foundation, Creative Commons, G8 statement
Open Science principles are an essential part of knowledge creation and sharing and innovation. They directly support researchers’ need for greater impact, optimum dissemination of research, while also enabling the engagement of citizen scientists and society at large on societal challenges. FOSTER aims to set in place sustainable mechanisms for EU researchers to integrate Open Science in their daily workflow, supporting researchers to optimizing their research visibility and impact and to facilitate the adoption of EU open access policies.
Note that climate data was among the first area to be addressed. That’s because climate change is an urgent problem, that is best tackled by ‘many eyes’.
Note that data management and open data are not the same thing – some data must be restricted in order for it to be managed responsibly.
Note that even if data is not suitable for sharing/publication, it still needs active management!
Research itself is increasingly international, interdisciplinary and collaborative, crossing both public and private spheres. Communication has never been more important.
These guidelines echo the G8 Science Ministers’ statement (2013), which offered similar good practice principles: https://www.gov.uk/government/news/g8-science-ministers-statement
Horizon 2020 open data pilot follows on from G8 Open Data Charter (June 2013)
We’ve created a DMPonline template for Horizon 2020 based on the annexes in the Guidelines for Data Management.
Research Data Netherlands provides the training Essentials 4 Data Support: off-line and on-line. Essentials 4 Data Support is an introductory course for those people who (want to) support researchers in storing, managing, archiving and sharing their research data. The training is structured according to the scholarly life cycle (slide 7).
“The OpenAIRE website describes how you can set up a Data Management Plan, and how you find an online research data repository if there’s isn’t one in your domain.”