This document summarizes Kevin Ashley's presentation on opening up research data from a UK perspective. The presentation discusses the policy background around open data in the UK, developments in infrastructure to support open data, and costs associated with making data openly available. It also notes that fully realizing the benefits of open data will require international cooperation across organizations like the Digital Curation Centre.
Opening up data: a UK perspective – Jisc and CNI conference 10 July 2014
1. Opening up data:
A UK perspective
Kevin Ashley
Digital Curation Centre
www.dcc.ac.uk
@kevingashley
Kevin.ashley@ed.ac.uk
Reusable with attribution: CC-BY The DCC is supported by Jisc
2. A summary
• Policy background
• The end point – why it matters
• UK reaction & developments
• Infrastructure
• Costs
• Joining up internationally
• More than data…
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 2
3. My home – the DCC
• Mission – to
increase capability
and capacity for
research data
services in UK
institutions
• Not just a UK
problem – an
international one
• Training, shared
services, guidance,
policy, standards,
futures
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 3
5. Data reuse stories
• The palaeontologist who saved years of work
with archaeological data
• The 19th-century ships logs that help us model
climate change
• The ‘noise’ from research radar that mapped
dust from Eyjafjallajökull
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 5
6. Data reuse - messages
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 6
Often your data tells
stories that your
publications do not
Not all data comes from
other researchers
One person’s noise is
another person’s signal
Discipline-bounded data
discovery doesn’t give us
all we need or want
7. Why does this matter?
• Research quality
– How close can we get to
the truth?
• Research speed
– How quickly can we get
to the truth?
• Research finance
– How much does the
truth cost?
• Improving one or more
of these is of interest to
all actors:
• Researchers as data
creators
• Researchers as data
reusers
• Research institutions
• Funders – hence
government and society
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 7
9. G8UK - Endorses
OA
Open Data
Charter
Policy Paper
18 June 2013
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 9
10. Funder requirements
• UK – RCUK (generic), NERC, STFC,
ESRC, BBSRC, EPSRC, MRC
• USA – NSF, NEH, NIH
• Europe
• Denmark, Germany, Netherlands…
• Most place burden on researcher –
some on the institution
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 10
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx
11. RCUK policy - The 1-minute version
• Research data are a public good – make openly
available in timely & responsible way
• Have policies & plans. Data with long-term value
should be preserved & usable
• Metadata for discovery & reuse. Link publications &
data
• Sometimes law, ethics get in the way. We understand.
• Limited embargos OK. Recognition is important –
always cite data sources
• OK to use public money to do this. Do it efficiently.
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 11
12. EPSRC policy points
• Awareness of regulatory environment
• Data access statement
• Policies and processes
• Data storage
• Structured metadata descriptions
• Permanent identifiers for data
• Securely preserved for a minimum of 10 years
from last use
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
Compliance
expected by 2015
18. DCC ‘institutional engagement’
Assess
needs
Make the case
Develop
support and
services
RDM policy
development
Customised Data
Management Plans
DAF & CARDIO
assessments
Guidance and
training
Workflow
assessment
DCC
support
team
Advocacy with senior
management
Institutional
data catalogues
Pilot RDM
tools
…and support policy implementation
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY
18
19. Who (in the UK) is leading on RDM ?
Library
IT
Research
Office
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 20
20. Survey of UK HE RDM readiness
• 61 of 69 responded (> 10%
funding from research)
• 90% using internal funding for
staff, training
• 57% filling all or most roles
through restructuring
• Russell Group: 4.7FTE -> 9.5 FTE
within a year
• Others: 2.6 FTE -> 3 FTE
• Lack of clarity on staff outside
central services
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 21
31%
38%
14%
17%
Research
Support &
Commercialisat
ion
Library or
Information
Service
IT/ Research
computing
Others
Data & charts from Angus Whyte, DCC
21. Drivers – UK institutions
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 22
0 10 20 30 40 50 60 70 80 90 100
UK Research Council data policies
Government policy on open data
Governance of research integrity /
academic conduct
Strategy to expand support for
research
EU Horizon2020 policy on data
management
92
57
54
54
53
% Agreeing
22. Least progress
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 23
0 5 10 15 20 25
Business planning &
sustainability
Digital preservation & continuity
planning
Governance of data access &
reuse
% indicating piloting or live
23. What kind of external support is
needed?
• Advice on retention, selection
• Advice on metadata creation for discovery
• Specifying tools & infrastructure
• Costing
• Advocacy to senior management
• Developing data catalogues/registers
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 24
24. EPSRC asked its researchers…
• 75% know of funder’s policy (25% in detail)
• 55% know their institution has a policy
• 70% are not aware of institutional training or
services for RDM
• Some contradictory responses
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 25
Thanks to Ben Ryan, EPSRC, for quotes & data
25. Services researchers are aware of
• Help with data management planning
• Help with metadata creation
• Training
• Backup of research data
• Dedicated storage
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 26
26. Some selected observations
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 27
The nature of my work is
such that it generates no
data that doesn't end up
in my papers, so I'm
unlikely to know about
these policies.
This is irrelevant
to me. I deal
with no
sensitive data
RDM sounds like a
gigantic waste of
time and I intend
to spend
as little time on it
as possible
I am on the point of
retiring so taking
less interest in
these things
30. Data centres are good value!
• See Jisc reports on ADS, BADC, UKDA:
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 31
http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx
31. Research Data Registry & Discovery
Service
• Modelled on Research Data Australia
• Gain visibility of small data collections
• Help drive home distinction between
discoverable data & open data
• Get evidence on which metadata items deliver
reuse potential
• Idea from UKRDS report in 2010
• RDA working group coordinating international
work
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 32
34. Pimp your
data –
make it
findable &
reusable
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 35
Gking.harvard.edu/data
35. On costs
• Costs of data curation relatively simple to
measure: see work of 4C (4cproject.eu)
• Charging and payment are more complex
• Funder rules can lead to perverse, inefficient
payment systems
• Fundamental question is ‘who pays’. This
changes the answer to ‘what does it cost’
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 36
37. What it means
• Project funding can only be spent during
projects on direct project costs
• Project funding comes with overheads, which
universities must use for research
infrastructure
• Ongoing (‘QR’) money is continuous, relates to
research ranking
• Important to distinguish business-as-usual
from exceptional requirements
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 38
38. A research lifecycle
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 39
Time
Resources
Exceptional zone
Normal zone
Project end
point
Business as
usual threshold
Eligible for
project funding
39. Being clever with costs
• Ongoing costs beyond project end cannot be
charged to a grant, but…
• ‘Pay once, store forever’ charges are
acceptable.
• Thus, incentive to outsource long-term
curation
• Yet universities are only acting as last-resort
option in any case – discipline data archives
preferred
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 40
Many of these
are run by
funders
40. What stops data reuse
• Loss
• Destruction
• Pride
• Gluttony
• Ineptitude
• Concealment
• Bureaucracy
• Complexity
• Procrastination
• Lack of potential
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 41
41. Kevin Ashley -Jisc/CNI 2014 -
CC-BY
42
“Departments don’t have guidelines or norms
for personal back-up and researcher procedure,
knowledge and diligence varies tremendously.
Many have experienced moderate to
catastrophic data loss”
Incremental Project Report, June 2010
http://www.flickr.com/photos/mattimattila/3003324844/
2014-07-10
42. Excuses – and responses
• “People will ask questions”
– So use a data centre or repository
• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction
• “It’s not interesting”
– Let others be the judge – your noise is my signal
• “I might get another paper out of it”
– Up to a point. We might get more research out of it
• “I don’t have permission”
– A real problem. But solvable at senior level
• “It’s too bad/complicated” –see above
• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the
evidence, it would be your priority as well
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 43
See e.g. Carly Strasser’s blog:
http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
43. Citability
• Making data available increases citations
• Everyone – academic, funder, institution –
loves citations
• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *
– Piwowar, Vision – 9% (microarray data)†
– Henneken, Accomazzi – 20% (astronomy) #
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 44
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1
http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
44. Open scholarly communication
• It’s not just publications and/or data
• Software, methods, workflows, instruments…
• Need to resist the urge to make everything
look like a publication
2014-07-10 Kevin Ashley -Jisc/CNI 2014 - CC-BY 45
Notes de l'éditeur
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
There are many such stories of unexpected data reuse; these are a few examples. The last, exemplified in the Old Weather project, is seeing the original data being reused for at least the third time and in doing so is helping both climatologist and family historians through a single piece of transcription work. An impressive result.
For an audience such as this, I shouldn’t have to explain why data reuse is important. But just in case, and to explain why some things have happened the way they have, I’ll describe some of the drivers.
Ensuring that all research data is discoverable and reusable increases the quality of the research that we do. It can add to the data we collect ourselves and can improve the statistical rigour of our results. Exposing data to scrutiny makes it more straightforward to validate or challenge the findings of others.
Making data available also improves the speed with which we can do research. If someone else has already gathered the data we need (perhaps for a different end use), we can move directly to the analysis stage of our work, saving both time and money.
And saving money increases the efficiency of research. We hope that the money saved lets us do more research, but even if it doesn’t society as a whole will gain. There’s evidence behind this that I’ll come to later, but it is an effective counter to those in some universities who feel that increasing funder requirements for data management simply leads to additional costs with no gain. There is a gain in all these areas, and hence every one of the actors – researchers, their employers, their funders, and society, should be motivated to make this happen.
An interesting trend to emerge is who is addressing RDM within the unis.
The library is leading in most cases and is involved regardless of who’s championing the cause.
Research offices are often the lead partner – seemingly for strategic reasons of senior buy-in and financial commitment.
IT are only leading in 2 out of the 20 cases and are disengaged / absent in a few others.
2014-07-10
Yet some researchers still aren’t convinced by the rhetoric. Carly Strasser at CDL has listed some of the reasons for not sharing data that she’s encountered – and here are some of my one-line responses. I’m not saying that the concerns aren’t sincere or reasonable but they can all be dealt with and some are positively misguided. The purpose of data centres, for instance, is to make data independently reusable (as stated in the OAIS standard) which relieves researchers of the burden of dealing with questions about it, at the same time as increasing the likelihood that their data will be cited.
Did I mention that making data available increases citations? This is a win all round. If you don’t believe me, here are three studies from three different areas that all show robust, positive correlations. The effect size varies with discipline, but we have enough evidence now that anyone who says that their area is different needs to come up with evidence to show why.