Boost PC performance: How more available memory can improve productivity
Scientific Data overview of Data Descriptors - WT Data-Literature integration, Dec 2013
1. Now open for submissions
Launching May 2014
www.nature.com/scientificdata
scientificdata@nature.com
@ScientificData
Advisory Panel including senior researchers, funders, librarians and curators
Susanna-Assunta Sansone
Honorary Academic Editor
(University of Oxford, UK)
Andrew L Hufton
Managing Editor
Victoria Newman
Editorial Curator
Ruth Wilson
Publisher
Supported by:!
Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment Research
Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office of
Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss
Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta,
UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute,
USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ●
Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics
Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical
Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ●
Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ●
Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN
Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ●
Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter
Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
2. Introducing a new content type:
!
Data Descriptor
!
Credit for Sharing Your Data
Open-access
Focused on Data Reuse
Peer-reviewed, curated
Promoting Community Data Repositories
3. Introducing a new content type:
!
Data Descriptor
!
Credit for Sharing Your Data
Open-access
Focused on Data Reuse
Session 2: Publishing Data
Aims of this session: to explore how data is being
represented and cited in research articles; to showcase new
data publishing products, and consider how the edges
between articles and data are joined or defined. How can
we maximize integrated utility across the different data
resources used by scientists?
Peer-reviewed, curated
Promoting Community Data Repositories
Session 3: Credit, Attribution, Reproducibility and
Provenance
Aims of this session: in an integrated information space, it is
essential to have transparency on the sources and methods
of scientific outputs. How do scientific articles contribute to this
goal? Are they sufficiently addressing requirements, what are
the most useful approaches and how might they be actioned?
4. Data Descriptor vs. traditional article!
• The data descriptor is only concerned with the facts behind the
methodology of data generation/collection and processing!
• A data descriptor can be:!
– submitted prior to journal article !
– submitted at the same time as the journal article!
– submitted after journal article!
Synthesis
Analysis
What is the
sample?
What did I do to
generate the data?
How was the data
processed?
Where is the data?
Who did what when?
Facts
Data Descriptor
Conclusions
Data Descriptor
NARRATIVE
Summary of
Data
Descriptor
Interpretation
Journal article
6. Data Descriptor has 2 components!
Article
or
narrative component
(PDF and HTML)
Supported by
Experimental metadata
or
structured component
(in-house curated, machine-readable formats)
8. Data Descriptor - article
!
Sections:!
• Title!
• Abstract!
• Background & Summary!
• Methods!
• Technical Validation!
• Data Records!
• Usage Notes !
• Figures & Tables !
• References!
• Data Citations!
!
In traditional publications this
information is not provided in a
sufficiently detailed manner
However this information is
essential for understanding,
reusing, and reproducing
datasets
13. Data Descriptor – experimental metadata (CC0)!
General-purpose, configurable format, designed to support:
• description of the experimental workflow, making the
annotation explicit and discoverable
• provenance tracking
• use community standards, such as minimal reporting
guidelines and terminologies
o over 300 ‘ontologies’
and over 60 guidelines
• conversions to - a growing number of - other metadata
formats
o e.g. used by EBI repositories
o and as linked data
funded by:
14. Data Descriptor – experimental metadata (CC0)!
General-purpose, configurable format, designed to support:
• description of the experimental workflow, making the
annotation explicit and discoverable
• provenance tracking
• use community standards, such as minimal reporting
guidelines and terminologies
o over 300 ‘ontologies’
and over 60 guidelines
• conversions to - a growing number of - other metadata
formats
o e.g. used by EBI repositories
o and as linked data
ISA is implemented by several service providers running
systems that are
• local, institute-based
o e.g. Harvard Stem Cell Institute
• project, consortium-based
o e.g. ToxBank serving a research cluster of seven
EU FP7 Health projects
• global, international repositories
• e.g. EBI’s MetaboLights
• and another ‘data journal, GigaScience in GigaDB
15. Data Descriptor – experimental metadata (CC0)!
Includes fields describing:
• each study, linking to relevant sections of the
Data Descriptor article
• authors’ details, including ORCID
• publications
• funding sources and funders’ name, via FundRef
• experimental factors
• study design
• assays
• protocols
18. Data Descriptor – experimental metadata (CC0)!
In-house curation team:
• assists users to submit the structured content
via simple templates and an internal
authoring tool
• performs value-added semantic annotation of
the experimental metadata
For advanced users/service providers willing to
export ISA-Tab for direct submission, we will
release a technical specification:
analysis !
method!
Data file or !
record in a
database!
script!
19. Discover similar datasets!
Structured content allows users to link, with one click, to other datasets
studying the same tissue, disease, organism, or using the same experimental
platform!
SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content
SciData DD
Same tissue
Same organism
Structured
content
Same assay
SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content
SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content
Community
Data
Repositories
21. Other data-related activities at NPG
!
• Figure source data
- putting data behind figures/graphs
- implemented at Molecular System Biology, rolled out at Nature and
progressively across all other Nature branded titles
Wang et al, Nature, 2013
doi:10.1038/nature12730
22. Other data-related activities at NPG
!
• Figure source data
- putting data behind figures/graphs
- implemented at Molecular System Biology, rolled out at Nature and
progressively across all other Nature branded titles
• Extended data
- expandable text and extra figures; rolled out at Nature
• Data citation
- tackling both styling and format; monitoring community developments,
such the Data Citation Synthesis Group
- to be rolled out across all Nature branded titles and Scientific Data
• Code reproducibility
- peer review, availability and reuse
• Supported community databases
- criteria for selection, common list across all NPG titles
• NPG’s Linked Data release – CC0