Incentives for sharing research data – Veerle Van den Eynden, UK Data Service
Incentives to innovate – Joe Marshall, NCUB
Incentives in university collaboration - Tim Lance, NYSERNET
Giving researchers credit for their data – Neil Jefferies, The Bodleian Digital Library Systems and Services (BDLSS)
Jisc and CNI conference, 6 July 2016
2. Introduction
Chair: Steven Hill, HEFCE
07/14/16 Title of presentation (Insert > Header & Footer > Slide > Footer > Apply to all) 2
3. The UK position on open access
Steven Hill, Head of Research Policy
07/14/16 Title of presentation (Insert > Header & Footer > Slide > Footer > Apply to all) 3
4. The UK position on
open access
Steven Hill
Head of Research Policy
Jisc-CNI conference 06 July 2016
@stevenhill
7. UK Government Policy
• Independent reports
– Dame Janet Finch – 2012
– Professor Adam Tickell – 2016
8. UK Government Policy
“I am confident that, by 2020, the UK will be
publishing almost all of our scientific output
through open access. The advantages of
immediate ‘gold’ access are well recognised, and
I want the UK to continue its preference for gold
routes where this is realistic and affordable. I also
accept the validity of green routes, which will
continue to play an important part in delivering
our open access commitments.”
Jo Johnson, Minister for Universities and Science
Image: Public Domain (https://commons.wikimedia.org/wiki/File:Jo_Johnson_Photo_Speaking_at_the_British_Museum.jpg)
9. UK Government Policy
“I am confident that, by 2020, the UK will be
publishing almost all of our scientific output
through open access. The advantages of
immediate ‘gold’ access are well recognised, and
I want the UK to continue its preference for gold
routes where this is realistic and affordable. I also
accept the validity of green routes, which will
continue to play an important part in delivering
our open access commitments.”
Jo Johnson, Minister for Universities and Science
Image: Public Domain (https://commons.wikimedia.org/wiki/File:Jo_Johnson_Photo_Speaking_at_the_British_Museum.jpg)
10. UK Policy Landscape
• Research Councils UK
– Journal articles and conference proceedings
– Preference for immediate, CC-BY access
– Accept access after 6 months (STEM) or 12 months (AHSS) with CC-BY-NC
– Block grant to HEIs for APCs (pure OA and hybrid)
• Charity Open Access Fund
– 7 major medical research funders (including Wellcome Trust)
– Journal articles, conference proceedings and monographs
– Deposit in PubMedCentral or EuropePMC
– Require immediate, CC-BY access
• Research Excellence Framework
– Journal articles and conference proceedings
– Deposit in institutional or subject repository
– Accessible for read and download at least 12 months (STEM) or 24 months (AHSS)
– Encourage: immediate access, liberal licencing, monographs
16. Wellcome Trust compliance analysis
• 2014/15: 30% of articles for which APC paid not compliant
with policy
• E.g. 392 articles not deposited in PMC/EuPMC - £765,000
APC value
• Hybrid journals main source of non-compliance:
Source: https://blog.wellcome.ac.uk/2016/03/23/wellcome-trust-and-coaf-open-access-spend-2014-15/
18. Prospects
• REF policy – significant increase in open content
• Possible action by funders on hybrid journals (see DFG, Norwegian Research
Councils)
• Offsetting deals
• The effect of Sci-Hub?
• Further developments on policy/implementation; 4 working groups of
Universities UK OA group:
– Efficiency
– Service standards
– Repositories
– Monographs
22. Incentives and motivations for sharing
research data, a researcher’s
perspective
Jisc / CNI conference: International advances in digital
scholarship
Oxford, 6 July 2016
Veerle Van den Eynden
UK Data Service
University of Essex
23. Why study incentives for data sharing ?
• Barriers to data sharing well know
• Wide variation in data sharing policies across Europe
• where policies are weak or not present, must rely
on norms and incentives
• While overall benefits of data sharing are clear,
benefits for individual researcher can be weak or
mixed
• Incentives a better basis for data / research
collaboration
24. Qualitative study of incentives, 2014
• 5 case studies – active data sharing research groups
• 5 European countries: FI, DK, GE, UK, NL
• 5 disciplines: ethnography, media studies, biology,
biosemantics, chemistry
• 22 researchers interviewed
• Q: research, data, sharing practices, motivations,
optimal times, barriers, future incentives,….
http://www.data-archive.ac.uk/about/projects/incentive
26. Different modes of data sharing
• Private management sharing
• Collaborative sharing
• Peer exchange
• Sharing for transparent governance
• Community sharing
• Public sharing (repository)
• Mutual benefits vs data ‘donation’
27. Data sharing practices in case studies
• Data sharing = part of scientific process
• Collaborative research
• Peer exchange
• Supplementary data to publications
• Sharing early in research (raw)
• Sharing at time of publication (processed)
• Well established data sharing practices in some
disciplines: crystallography, genetics
• Development of community / topical databases:
BrassiBase, LARM archive
• Some sharing via public repositories: chemistry,
ethnography, biology
28. Incentives – direct benefits
• For research itself:
• collaborative analysis of complex data
• methods learning
• research depends on data /information, data mining
• suppl. data as evidence for publications
• research = creating data resources
• For research career:
• visibility, also of research group
• reciprocity
• reassurance, e.g. invited to share
• For discipline & for better science
29. Incentives – norms
• Sharing = default in research domain, research group, institution
• Hierarchical sharing throughout research career
• Challenge conservative non-sharing culture
• Openness benefits research, but individual researchers reluctant to take lead
30. Incentives – external drivers
• Funders directly fund data sharing projects
• Journals expects suppl. Data
• Learned societies develop infrastructure & resources
• Data support services
• Publisher and funder policies and expectations
• may not push data sharing as much as could do, e.g.
supplementary data in journal poor quality; mandated repository
deposits minimal, exclude valuable data
• slowly change general attitudes, practices, norms
31. Future incentives for researchers
• Policies and agreements – create level playing field
• Training – sharing to become standard research practice
• Direct funding for RDM support
• Infrastructure and standards
• Micro-publishing/micro-citation
• Broaden norms
32. Recommendations
• Changing norms
• Encourage direct benefits: science, careers
• Leadership from funders, institutions, learned societies, publishers
• “Mixed economy” of incentives that consider:
• phase in research data life cycle
• career stage of researcher
• context of discipline / research environment
• European level:
• invest in ‘rich’ data resources: data + context
33. Recommendations for funders
• All research funders data sharing policy - expectations
for data accessibility; budget share for RDM
• Funding support services, cf. funding publication costs
• Invest in data infrastructure with rich context
• Fund data sharing training for students and doctoral
researchers
• Target funding at reuse of existing data resources
• DMP evaluation guidance for peer reviewers of bids
34. Recommendations for learned societies
• Research recognition for data sharing and data
publishing
• Data sharing expectations for the disciplines, e.g. code
of conduct.
• Data sharing agreements for discipline
• Promote developing data sharing resources and
standards for the research discipline.
35. Recommendations for research institutions
• Data ‘publishing’ recognition in research assessment /
career progression
• Data impact in PhD career assessment, e.g. impact
portfolio, data CV
• Set data sharing expectations for institution
• Data sharing training part of standard student research
training
• Integrated RDM support services (one-stop-shop)
36. Recommendations for publishers / editorial boards
• Boost direct career benefits of data sharing:
• data citation
• data sharing metrics
• micro-citation
• tools: DOIs, ORCID, digital watermarking
• Publication of negative findings, failed experiments
• Full datasets as supplementary material
• All supplementary data openly available
• Correct data citation
• (Open) standards for file formats and supplemental
documentation
37. Recommendations for data centres / repo’s
• Pull factors for data sharing, e.g. invitations for data
• Specialist data sharing training
• Flexible access systems for data for data owners
• Rich data resources, with context of publications, etc
38. What other research found
Youngseek, K and Adler, M (2015)
Social scientists’ data sharing behaviors: Investigating the roles of individual motivations, institutional pressures, and data
International Journal of Information Management 35(4): 408–418.
•online survey of 361 social scientists in USA academia
•predict data sharing behaviour through theory of planned behaviour (individual
motivation is based on own motivations and availability of resources) and
institutional theory (institutional environment produces structured field of social
expectations and norms, using (dis)incentives to shape behaviour and practices)
•main drivers for data sharing:
• personal motivations: perceived career benefit and risk, perceived effort,
attitude towards data sharing
• perceived normative pressure
•funders, journals and repositories are not significant motivators
39. What other research found
Sayogo, D.S. and Pardo, T.A. (2013)
Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data.
Government Information Quarterly, 30(1): 19-31.
•Online survey with 555 researchers, cross-disciplinary, 75% USA
•Ordered logistic regression to assess the determinants of data sharing, analysing willingness to publish
datasets as open data against 7 variables: organisational support, DM skills, data reuse acknowledgement,
legal and policy conditions owner sets for data reuse, concern for data misinterpretation, economic motive,
funder requirement
•Main determinants are:
• DM skills and institutional support
• data reuse acknowledgement, legal and policy conditions owner sets for data reuse
40. What other research found
Expert Advisory Group on Data Access (2014)
Establishing incentives and changing cultures to support data access.
•interviews with key stakeholders: funders, senior academic managers, postdoctoral
researchers, chair REF panel, senior data manager
•web survey with researchers and data managers (Nr responses unknown)
•recommended incentives:
• research funders:
• strengthen and finance data management and sharing planning
• Fund and develop infrastructure and support services
• recognise high quality datasets as valued research outputs in REF
• career paths and progression for data managers in research teams
• research institutions:
• clear policies on data sharing and preservation
• training and support for researchers to manage data effectively
• journals
• clear policies on data sharing and processes
• datasets underlying published papers readily accessible
• appropriate data citation and acknowledgement
41. Thanks
• Knowledge Exchange
• Interview partners:
• Anders Conrad (DK)
• Damien Lecarpentier & Irina Kupiainen (FL)
• Jens Nieschulze & Juliane Steckel (GE)
• Joeri Nortier (NL)
• Interviewees
• Van den Eynden, V. and Bishop, L. (2014). Sowing the seed:
Incentives and Motivations for Sharing Research Data, a
researcher's perspective. Knowledge Exchange.
http://repository.jisc.ac.uk/5662/1/KE_report-incentives-for-sharing-researchdata.pdf
48. The Productivity Puzzle
UK strength
•UK world leading on measure research measures
•Strength of UK-research base attractive to global
business R&D activities
UK weakness
•Yet, UK lags behind international competitors in
terms of productivity rates
•Encouraging more business-led innovation critical
49. Mayfield: Improving Productivity
“Innovation is the lifeblood of long-term
productivity growth – new products, new markets,
and new ways of doing things create new
opportunities”
•Innovation is a collaborative activity – innovators
need other innovators to work with.
•Need to create the space to bring ideas from the
UK’s research base into thriving, disruptive new
businesses.
50.
51. Academic operability:
impacting impactful impact
Understanding the
operating environments in
which our academics and
universities operate today.
•Recent report by National
Centre canvassed all UK
academics about how and
why they engage with
business and others.
52.
53.
54.
55. Discoverability:
an era of open “openness”
“Democratisation” of opening up information
•Openness agenda complex, multifaceted, profound
•Opening up new opportunities (and challenges)
Information in and of itself is significant
•Information and data generated at exponential rates
•Cloud not limited boundaries of geography or power
•Information is a click away
56. Discoverability:
An ORCiD by any other name
Businesses interested to know WHO is doing WHAT research
WHERE:
•ORCiD (and other tools) make it easier to discover researchers and
research
•Tools like equipment.data and _connect give insights into other
opportunities for others
•Discoverability provides an important incentive to academics to
promote their activities
57.
58. Connectivity
• Dowling Review (and others) have highlighted
issues SMEs (and corporates) can have:
– Awareness of breadth of opportunities available;
– Knowing what to ask, how to ask and what it means;
– Addressing issues of proximity and time.
59. Four concepts
Underpinning the emerging tool
Aggregating
Interpreting
Matching
Triaging
Inspiring
Guiding
Engage networks
Embedded tool
64. 64
New York State Education & Research Network
A private 501(c)(3) not-for-profit established in 1985.
One of the original NSFNET regional networks.
Creator of two for-profit companies: AppliedTheory &
PSInet.
A member organization; activities supported by member
fees.
Board of Directors composed primarily of non-profit CIOs.
Offices in Syracuse and Troy.
Staff of sixteen, majority in Syracuse.
... not a state agency.
Working with members to solve technology-related
problems of mutual concern.
About NYSERNet
65. 65
About NYSERNet
Since inception 31 years ago, NYSERNet has been committed to sustaining
advanced research networks for the most demanding, data intensive applications.
In the beginning, this included seminal contributions to the concept of the
network:
“In response to the Connections solicitation, NSF received innovative responses from what would
become two of the major regional networks: SURANET and NYSERNET. They proposed a regional,
distributed network design rather than one with all universities independently connected to the regional
supercomputing center (a “star” design).
“The NYSERNET and SURANET examples caused a major paradigm shift at NSF. Instead of funding
institutional connections to supercomputer centers, the NSF shifted to funding connections of ‘cohesive’
regional networks. ... NSFNET is not a network. It is an internetwork - i.e., a network of networks, which
are organizationally and technically autonomous but which interoperate with one another.”
Steven Wolff, NSF
66. 66
Tim Lance, NYSERNet
Gary Roberts, Alfred University
Juan Montes, American Museum of
Natural History
Sharon Pitt, Binghamton
University
Tom Schlagel, Brookhaven
National Laboratory
Brian Cohen, CUNY
Gaspare LoDuca Columbia
University
Dave Lifka, Dave Vernon, Cornell
University
Bob Juckiewicz, Hofstra University
Patricia Kovatch, Icahn School of
Medicine at Mount Sinai
Bill Thirsk, Marist College
Daniel Barchi, NY Pres. Hospital
Marilyn McMillan, New York
University
John Kolb, Rensselaer Polytechnic
Inst.
Jeanne Casares, Rochester
Institute of Tech.
Armand Gazes, Rockefeller
University
Justin Sipher, St. Lawrence
University
Melissa Woo, Stony Brook
University
Chris Sedore, NYSERNet
Chris Haile, University at Albany
Brice Bible, Tom Furlani University
at Buffalo
Dave Lewis, University of
Rochester
NYSERNet Board
67. 67
David Ackerman, NYU
Toby Bloom, New York Genome
Center
Duncan Brown, Syracuse
University
Chris Carothers, Rensselaer
Polytechnic Institute
Jim Dias, University at Albany
Jon Dordick, Rensselaer
Polytechnic Institute
Jim DuMond, Marist College
Stratos Efstathiadis, New York
University
Tom Furlani, University at
Buffalo
Robert Harrison, Stony Brook,
Brookhaven
Halayn Hescock, Columbia
University
Patricia Kovatch, Icahn School
of Medicine at Mt. Sinai
Michael Kress, College of Staten
Island
Tim Lance, NYSERNet
Dave Lifka, Cornell University
Brendan Mort, University at
Rochester
Bill Owens, NYSERNet
Vijay Agarwala, New York
Genome Center
Ryne Raffaelle, Rochester
Institute of Technology
Tom Schlagel, Brookhaven
National Laboratory
Jill Taylor, Wadsworth Center
Andrew White, Stony Brook
University
Research Advisory Council
94. 94
General Assertion
The size of data sets and complexity of
necessary computations are growing
faster than the technologies to move,
store, manipulate, and calculate.
Question: Is everything growing
exponentially?
100. Raw data
Feature
extraction metadata
Domain linkages
Full
contextual analytics
Location risk
Occupational risk
Dietary risk
Family history
Actuarial data
Government statistics
Epidemic data
Chemical exposure
Personal financial situation
Social relationships
Travel history
Weather history
. . .
. . .
Patient records
Data Multiplier Effect
Factorial explosion in context
From IBM
109. 109
Data Decision Tree
Preservation and Curation
Legal and Ethical
Sustaining Partnerships
Education
Bringing Government and the Public Along
Really Hard Stuff
112. 112
David Ackerman, NYU
Toby Bloom, New York Genome
Center
Duncan Brown, Syracuse
University
Chris Carothers, Rensselaer
Polytechnic Institute
Jim Dias, University at Albany
Jon Dordick, Rensselaer
Polytechnic Institute
Jim DuMond, Marist College
Stratos Efstathiadis, New York
University
Tom Furlani, University at
Buffalo
Robert Harrison, Stony Brook,
Brookhaven
Halayn Hescock, Columbia
University
Patricia Kovatch, Icahn School
of Medicine at Mt. Sinai
Michael Kress, College of Staten
Island
Tim Lance, NYSERNet
Dave Lifka, Cornell University
Brendan Mort, University at
Rochester
Bill Owens, NYSERNet
Vijay Agarwala, New York
Genome Center
Ryne Raffaelle, Rochester
Institute of Technology
Tom Schlagel, Brookhaven
National Laboratory
Jill Taylor, Wadsworth Center
Andrew White, Stony Brook
University
Research Advisory Council
113. 113
Tim Lance, NYSERNet
Gary Roberts, Alfred University
Juan Montes, American Museum of
Natural History
Sharon Pitt, Binghamton
University
Tom Schlagel, Brookhaven
National Laboratory
Brian Cohen, CUNY
Gaspare LoDuca Columbia
University
Dave Lifka, Dave Vernon, Cornell
University
Bob Juckiewicz, Hofstra University
Patricia Kovatch, Icahn School of
Medicine at Mount Sinai
Bill Thirsk, Marist College
Daniel Barchi, NY Pres. Hospital
Marilyn McMillan, New York
University
John Kolb, Rensselaer Polytechnic
Inst.
Jeanne Casares, Rochester
Institute of Tech.
Armand Gazes, Rockefeller
University
Justin Sipher, St. Lawrence
University
Melissa Woo, Stony Brook
University
Chris Sedore, NYSERNet
Chris Haile, University at Albany
Brice Bible, Tom Furlani University
at Buffalo
Dave Lewis, University of
Rochester
NYSERNet Board
115. Giving researchers credit for their data
Neil Jefferies,The Bodleian Digital Library Systems and
Services (BDLSS)
07/14/16
116. Concept: “Carrot” for Data Deposit
“Submit data paper” button on data repository item
Researcher gets…
Another publication/citation opportunity
Preservation of data
Avoid publisher submission system
Publisher gets
More/faster data paper submissions
Better metadata quality
Link referrals from data repositories
Repositories get
More data deposits
Better metadata quality
Link referrals from publishers
Funders get
More re-use, more impact
Reproducability
117. Schematic
Helper
App
Data
Repo
Publisher
Data
Paper
DataCite
ORCID
Enhanced
Metadata
+ Text etc.
Cross
Ref
1. Press button – SWORD2 package sent to helper app
with DataCite DOI and submitters ORCID
Text
(Gdocs)
CoAuthors
(ORCID)2. In Helper App – Select journal,
write paper using template, add
coauthors from ORCID and agree
to publisher T&C’s..
3. Enhanced SWORD2 package
sent to the publisher. Ingested
automatically into publisher
submission system.
4. Publication updates
ORCID Profile for
Repo to harvest.
119. Phase 1 – Feasibility Study
RDA Publisher Workflow Analysis
Strawman spec for Helper app/API
Most data papers and related data is open
Questionnaire for Repositories and Publishers
Confirm requirements
Gauge interest in proposal
Overwhelmingly positive feedback
Offers of collaboration
120. Phase 2 – Proof of Concept
Detailed API Spec (SWORD2/DataCite)
Protoype helper app “Data Paper Companion”
Fedora Repository/Hydra
Sword Client/Server Ruby Gems
Repository -> Publisher in <10 minutes (if you have the
text written)
Community building
F1000 Research, Elsevier (Data in Brief and Mendeley Data),
ORCID, RDA/THOR
Many more collaboration offers than we could handle
Figshare, OJS, Dryad, Nature...
121. Phase 3 – The Business Case
We started to look for indications of the time this app
would save scholars to quantify the possible benefits...
We were expecting to measure efficiency gains of
maybe tens of minutes per submission or a bit more...
124. Phase 3 - Consolidation
Demonstrate real paper(s) published using the workflow
Join forces with Streamlining Deposit project team
UX expertise
Align metadata requirements
Expect repo-led and publication-led workflows to co-exist
Sustainability
Steering Committee to initiate governance structures
API Spec as a formal publication
Code as a reference implementation/test harness
THOR project – identifier ecosystem for research entities
Jisc shared services
ORCID
Cloud hosting: Azure (Microsoft Research Grant)
125. Phase 3 - Expansion
Expanding reach/integration
More outreach activities
Updated SWORD modules for EPrints, Dspace, OSP
Work with structured repositories such as EBI, NCBI etc.
(domain/data specific)
Take up other publisher offers: Nature, OUP
Datasets in ORCID
Journal Policy Registry by the back door?
Roadmap (not development) for additional use cases
Multiple datasets (other people's data)
Non-open data (DataShield?)
Not just data papers
REF/Impact metric* friendliness