Boost PC performance: How more available memory can improve productivity
Open data pilot
1. The Horizon 2020 Open
Data Pilot
Sarah Jones
Digital Curation Centre, Glasgow
sarah.jones@glasgow.ac.uk
Twitter: @sjDCC
Fot-Net Data Stakeholder Meeting on Open Data and Data Re-use
in Horizon 2020, 10th March 2015, ERTICO, Brussels
Funded by:
2. What is the Digital Curation Centre?
“a centre of expertise in digital information curation
with a focus on building capacity, capability and skills
for research data management across the UK's
higher education research community”
www.dcc.ac.uk
3. Benefits and drivers
WHY SHARE DATA (OPENLY)?
Image CC-BY-NC-SA by Wonderwebby www.flickr.com/photos/wonderwebby/2723279491
5. Science as an open enterprise
https://royalsociety.org/policy/projects/science-public-enterprise/Report
“Much of the remarkable growth of scientific
understanding in recent centuries is due to open
practices; open communication and deliberation
sit at the heart of scientific practice.”
The Royal Society report calls for ‘intelligent
openness’ whereby data are accessible,
intelligible, assessable and usable.
7. Increased use and economic benefit
UP TO 2008
Sold through the US Geological Survey
for US$600 per scene
Sales of 19,000 scenes per year
Annual revenue of $11.4 million
SINCE 2009
Freely available over the internet
Google Earth now uses the images
Transmission of 2,100,000 scenes per year.
Estimated to have created value for the
environmental management industry of $935
million, with direct benefit of more than
$100 million per year to the US economy
Has stimulated the development of
applications from a large number of
companies worldwide
The case of NASA Landsat satellite imagery of the Earth’s surface:
http://earthobservatory.nasa.gov/IOTD/view.php?id=83394&src=ve
8. HORIZON 2020 OPEN DATA PILOT
Image CC-BY-NC-SA by Tom Magllery www.flickr.com/photos/lwr/13442910354
9. Why open access and open data?
“The European Commission’s vision is
that information already paid for by the
public purse should not be paid for again
each time it is accessed or used, and that
it should benefit European companies and
citizens to the full.”
http://ec.europa.eu/research/participants/data/
ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-
pilot-guide_en.pdf
10. H2020 open data pilot
• Seven areas are participating in the pilot, which correspond to
about €3 billion or 20% of the overall Horizon 2020 budget in
2014 and 2015.
• Projects in other areas can opt in on a voluntary basis
Guidelines on Data Management in Horizon 2020
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pi
lot/h2020-hi-oa-data-mgt_en.pdf
• Participants can opt out at
proposal stage or during the
lifetime of the project
• Reasons for exemption to be
explained in the DMP
11. Which data does the pilot apply to?
Data, including associated metadata, needed to
validate the results in scientific publications
Other curated and/or raw data, including associated
metadata, as specified in the DMP
Doesn’t apply to all data (researchers to define as appropriate)
Don’t have to share data if inappropriate – exemptions apply
12. Key requirements of the open data pilot
1. Deposit in a research data repository
2. Make it possible for third parties to access, mine, exploit,
reproduce and disseminate data – free of charge for any user
3. Provide information on the tools and instruments needed to
validate the results (or better still provide the tools)
Image CC-BY-NC-SA by adesigna www.flickr.com/photos/adesigna/4090782772
13. Data Management Plans
Projects participating in the pilot will be required to develop a
Data Management plan (DMP), in which they will specify what
data will be open.
• What types of data will the project generate/collect?
• What standards will be used?
• How will this data be shared/made available? If not, why?
• How will this data be curated and preserved?
Note that the Commission does NOT require applicants to
submit a DMP at the proposal stage. DMPs are a deliverable
for those participating in the pilot.
15. Data sharing: degrees of openness
Open Restricted Closed
Content that can be
freely used, modified
and shared by anyone
for any purpose
Limits on who can use the data,
how or for what purpose
- Charges for use
- Data sharing agreements
- Restrictive licences
- Peer-to-peer exchange
- …
online under an open licence
structured data
non-proprietary formats
use URIs to denote things
link data to provide context
Five star open data
http://5stardata.info
Unable to share
Under embargo
16. How to make data open?
1. Choose your dataset(s)
What can you may open? You may need to revisit this step if
you encounter problems later.
2. Apply an open license
Determine what IP exists. Apply a suitable licence e.g. CC-BY or CC0
3. Make the data available
Provide the data in a suitable format. Use repositories.
4. Make it discoverable
Post on the web, register in catalogues…
https://okfn.org
17. www.dcc.ac.uk/resources/how-guides/license-research-data
Data licensing
This DCC how-to guide outlines pros and
cons of each approach and gives
practical advice on how to implement
your licence.
• Do you own the rights or have
permission to redistribute?
• Do you need to place restrictions on
who can use the data or how?
19. Metadata standards
• Good metadata is key for research data access and re-use
• Many disciplines have formalised community metadata standards
• Use relevant standards for interoperability
www.dcc.ac.uk/resources/metadata-standards
20. Data catalogues
Institutional services
e.g. DataFinder at the
University of Oxford
National services e.g.
Research Data Australia
and RDDS pilot in the UK
Data centres and community
initiatives e.g. FOT Data
Catalogue, B2FIND etc
22. Data repositories
http://databib.org
http://service.re3data.org/search
Zenodo
• Joint effort by OpenAIRE-
CERN
• Multidisciplinary repository
• Multiple data types
– Publications
– Long tail of research data
• Citable data (DOI)
• Links funding, publications,
data & software
www.zenodo.org
• Does your publisher or funder suggest a repository?
• Are there data centres or community databases for your field?
• Does your university offer support for long-term preservation?
23. EUDAT services
EUDAT offers a pan-European solution, providing a generic set
of services to ensure minimum level of interoperability
Building common
data services in close
collaboration with
25+ communities
www.eudat.eu
24. EUDAT B2 service suite
Covering both access and
deposit, from informal data
sharing to long-term
archiving, and addressing
identification, discoverability
and computability of both
long-tail and big data,
EUDAT’s services will
address the full lifecycle of
research data
25. Institutional RDM support services
Diagram courtesy of Sally Rumsey, University of Oxford
University of Edinburgh Research
Data Management Roadmap
www.ed.ac.uk/schools-
departments/information-
services/about/strategy-
planning/rdm-roadmap
Research Data Oxford
http://researchdata.ox.ac.uk
26. Support on Data Management Plans
• Checklist on what to include
• How to guide on developing a plan
• Guidance on assessing plans (forthcoming)
• Webinars and training materials
• DMPonline tool
• Example DMPs
www.dcc.ac.uk/resources/data-management-plans
27. DMPonline
• Presents requirements from funders
• Guidance from funder, uni, discipline…
• Example answers
• Ability to share plans with collaborators
• Export into a variety of formats
• …
https://dmponline.dcc.ac.uk
28. Thanks for listening
DCC guidance, tools & case studies:
www.dcc.ac.uk/resources
Follow us on twitter:
@digitalcuration and #ukdcc
Notes de l'éditeur
The Royal Society report ‘Science as an Open Enterprise’ emphasises that much of the growth of scientific understanding is due to open practices. Being open about your work and encouraging feedback from others is at the heart of scientific practice.
The report calls for ‘intelligent openness’ – data shouldn’t just be accessible, they need to be intelligible by others so they can assess and reuse them.
Certain research communities have also seen the benefit of sharing data as it speeds up the process of discovery. This article shows how researchers in the field of Alzheimer’s research have agreed as a community to share data immediately to make scientific breakthroughs.
There’s also an economic benefit, as seen by the case of the NASA landsat satellite images. These were sold until 2008 for $600 a scene. Now they’re freely available and used by Google Earth. Previously they sold 19,000 images a year, whereas now they transmit 2.1 million. The revenue has gone up incredibly too from $11.4 million to over $100 million with an estimated value of $935 million. The release has also stimulated the development of applications from companies worldwide.
This case study comes from the Royal Society Report on Science as an Open Enterprise.
The background to this is about making the most of the data that has been created through publicly funded research. The guidelines speak of:
Improved quality of results
Greater efficiency
Faster to market = faster growth
Improved transparency of the scientific process
For those that do take part in the pilot, the starting point is to make all data that underpin publications open. After that, it’s for researchers to define what else should be shared and can be made open. This should be outlined in the DMP.
Sometimes sharing is not appropriate (e.g. due to ethical rules of personal data, intellectual property protection, commercial restrictions etc). It’s fine to apply restrictions in such cases. This could be an embargo period prior to publication or while a patent is sought, or controlling access and re-use to protect participants’ identities (e.g. via the use of secure data services / data enclaves or data sharing agreements). Restrictions should be outlined up-front in the DMP.
So the specific requirements on projects that participate in the pilot are to:
- Deposit data in a repository
Enable reuse via open licensing
Provide any tools (or at least info on them) needed to validate the data
The focus is planning for data sharing and then facilitating that through deposit, licensing and enabling reproducibility
The Open Knowledge Foundation suggests four simple steps to make data open.
First off you need to decide what to share. Not all data can be made openly available due to commercial restrictions or sensitivities.
Once you’ve decided what to share, determine what IP exists and apply a suitable licence.
You should then make the data available in a suitable format so others can bulk download it. Remember that for it to be useful you want to share appropriate metadata and documentation too. Using repositories is useful to make sure your data are properly managed and preserved for the long-term.
The final step is to make your data discoverable. Put it online, tell others about it and add details to various registries so it gets found.
Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options
The RDRDS will work in partnership with a network of UK subject-specific data centres and university-based institutional data repositories to harvest dataset metadata records, and so promote the discoverability of research data held by all partner institutions. Partner institutions will remain responsible for the selection and stewardship of the datasets.
There are already services doing similar things, but none have quite the same scope.
In fact we have the potential to complement existing services (see Figure 1), by:
collating records from both data centres and institutional repositories;
normalising and deduplicating, to provide a unified search interface;
ultimately make the records visible in other places researchers might look.
All share common challenges:
– Reference models and architectures
– Persistent data identifiers
– Metadata management
– Distributed data sources
– Data interoperability