Presentation given at Biocuration 2019 Session 5 (Interacting with the Research Community)
Abstract:Journals and publishers have an important role to play in the drive to increase the reproducibility of published science. Since its launch in 2014, the Nature Research journal Scientific Data has established a reputation for publishing data papers (‘Data Descriptors’) that are highly reusable, as evidenced by a strong citation record. One of the ways in which Scientific Data ensures maximum reusability of published data is via the in-house data curation workflow applied to every Data Descriptor. In 2017, Springer Nature launched its Research Data Support (RDS) service to provide data curation expertise to researchers publishing at other Springer Nature journals.
During curation at Scientific Data and RDS, our data editors familiarise themselves with the related manuscript and perform a thorough check of each data archive. This ensures the descriptions in the manuscript match the metadata and data at the data repositories. The curation process facilitates the identification of any discrepancies between the manuscript text and the information held at the data repository.
Over the last year, the curation team have been recording the types of discrepancies rectified as a direct result of our curation process. At Scientific Data approximately 10% of the discrepancies the team find are significant enough to potentially have warranted a formal correction had the issue had not been resolved prior to publication.
In this presentation we give an overview of our observed outcomes from embedding data curation within the publishing process. We describe of how we are monitoring the value of our curation work, and show examples of the types of discrepancy most commonly identified through curation at Scientific Data and RDS.
Observational constraints on mergers creating magnetism in massive stars
The value of data curation as part of the publishing process
1. The value of data curation as
part of the publishing process
Varsha Khodiyar, PhD
Biocuration 2019
Antarcticameltdowncoulddoublesealevelrise
2. 1
Data curation as part of publishing / 10th April 2019
A brief history of data curation at Springer Nature
• Scientific Data launched
May 2014
• Novel manuscript format,
the Data Descriptor
• Focus on data generation
and data peer review
• Machine readable metadata
file generated by in house
curators (ISA-Tab format) for
each published Data
Descriptor
www.nature.com/scientificdata
3. 2
Data curation as part of publishing / 10th April 2019
Data Descriptors have human and machine
understandable components
Human readable
representation of
study
i.e. article (HTML &
PDF)
Human readable
representation of
study
i.e. article (HTML &
PDF)
4. 3
Data curation as part of publishing / 10th April 2019
Data Descriptors have human and machine
understandable components
Machine
accessible
representation
of study, i.e.
metadata
Human
readable
summary of
the metadata
5. 4
Data curation as part of publishing / 10th April 2019
Output from Scientific Data’s curation process
Machine readable overview of how
sources and samples were turned
into the digital data outputs.
Curator captures key dataset
characteristics using ontology terms:
• Type of study
• What was measured
• How it was measured
• Any independent variables
• Sample characteristics e.g.
- Species
- Geographical location
- Environment type
scientificdata.isa-explorer.org
6. 5
Data curation as part of publishing / 10th April 2019
Publishing a data paper with Scientific Data
Deposit
data in an
appropriate
repository
Draft a
manuscript
based on
the
template
Submit
your
manuscript
Peer review
of the
manuscript
Revise the
manuscript
as required
Make any
changes
requested
by the data
curators
The data
descriptor
is published
7. 6
Data curation as part of publishing / 10th April 2019
A brief history of data curation at Springer Nature
• Research Data Support
service (RDS) launched April
2017
• Expansion of data curation
practice to other Springer
Nature journals
• Provide support and advice
on research data sharing, for
authors and editors
• Promote best practice for
sharing research data
associated with a publication
www.springernature.com/la/authors/research-data
8. 7
Data curation as part of publishing / 10th April 2019
To help authors and journals follow good practice in sharing and archiving of
research data, we provide optional data deposition and curation services.
Springer Nature Research Data Support
Researchers
submit their
data files
securely
The Research
Data team
curates the data
and metadata
The data are
published and
linked to the
author’s paper
More information is available on our website here:
http://www.springernature.com/gb/group/data-policy/data-support-services
9. 8
Data curation as part of publishing / 10th April 2019
Comprehensive
description
including the data
context of the
study and data
gathering method
Altmetrics provide
information on
downloads and
citations
Relevant categories
and keywords added to
enhance discoverability
of the data
Dataset assigned a
DOI
Source: https://doi.org/10.6084/m9.figshare.5259415
Example of curation output from Research Data Support
Licence to clarify
reuse conditions
10. 9
Data curation as part of publishing / 10th April 2019
Example author feedback report
11. 10
Data curation as part of publishing / 10th April 2019
Checks carried out by the curation team
Most appropriate repository used?
Data and metadata at the repository
consistent with manuscript?
Terms of use and terms of access for
the data consistent with manuscript?
Terms of data use consistent with
journal policy?
12. 11
Data curation as part of publishing / 10th April 2019
Addition of missing information
Error correction
Suggestions
to increase
FAIRness
Improvements to manuscript tables, text or figures to aid
understanding and reuse of the work
Data access or data license conditions updated at repository or
manuscript to aid accessibility
Repository metadata improved to aid dataset discoverability
Improvements to file names and/or file structure at the repository
to aid understanding and reuse of the work
Possible outcomes of curation
Manuscript
text
Manuscript
figure
Manuscript
table
Data files at
the repository
13. 12
Data curation as part of publishing / 10th April 2019
Curation outcomes at Scientific Data (Study period March
2018 to March 2019)
77% of manuscripts - no
issues identified
23% of manuscripts - at least 1 issue identified and
resolved
10% of manuscripts - errors identified and resolved
14. 13
Data curation as part of publishing / 10th April 2019
RDS curation outcomes (Study period March 2018 to March
2019)
In 55% of RDS
curation jobs, the
curator suggested
updates to the
repository hosted
data files
Sensitive data removed
Missing data added
License conditions updated
File format & naming improved
Mandated data moved to specialist repositories
Supplementary Information moved to repository
Opaque language clarified
15. 14
Data curation as part of publishing / 10th April 2019
We encourage the use of community endorsed ontologies,
standards and repositories where possible
16. 15
Data curation as part of publishing / 10th April 2019
We encourage the use of community endorsed ontologies,
standards and repositories where possible
springernature.com/gp/authors/research-data-policy/repositories/12327124
17. 16
Data curation as part of publishing / 10th April 2019
The Springer Nature research data curators
Joseph Salter
Development Editor
Tristan
Matthews
Assistant
Research Data
Editor
Graham Smith
Senior Research Data
Editor
Rebecca Grant
Research Data Manager
Alexandra Philiastides
Assistant Research
Data Editor
Varsha Khodiyar
Data Curation Manager
18. 17
Data curation as part of publishing / 10th April 2019
• Springer Nature has had at least one research data curator since the launch
of Scientific Data in 2014.
• Since 2017, data curation has been available as a separate service for
increasing numbers of Springer Nature authors and editors.
• The Research Data team has built up significant expertise in the area of data
publishing.
• Our curators are able to identify and help resolve both minor and major
issues prior to articles and data being made public.
• Our curators increase the FAIRness of published research data
• We focus on increasing the Findability and Accessibility of data and
metadata in our curation processes.
• We encourage our authors to increase the Interoperability and
Reusability of their data and metadata; by using community ontologies
for metadata, and encouraging the use of community research data
infrastructure where this exists.
Summary: Curation as part of the publishing process
19. 18
Data curation as part of publishing / 10th April 2019
18
The story behind the image
Antarctica meltdown could
double sea level rise
Researchers at Pennsylvania State University
have been considering how quickly a glacial ice
melt in Antarctica would raise sea levels. By
updating models with new discoveries and
comparing them with past sea-level rise events
they predict that a melting Antarctica could raise
oceans by more than 3 feet by the end of the
century if greenhouse gas emissions continued
unabated, roughly doubling previous total sea-
level rise estimates. Rising seas could put many
of the world’s coastlines underwater or at risk of
flooding and storm surges.
Varsha Khodiyar, PhD
Data Curation Manager
varsha.khodiyar@nature.com
@varsha_khodiyar
go.nature.com/ResearchDataServices
researchdata.springernature.com
researchdata@springernature.com
nature.com/scientificdata
scientificdata@nature.com
@scientificdata