The document discusses challenges in data standards, sharing, and publication in life sciences. It notes there are many reporting standards to describe experiments but issues in identifying, tracking usage, and assessing impact of standards. It proposes creating a registry of standards that is searchable and associates standards with data policies, databases, and metrics to evaluate use. This would help stakeholders identify appropriate standards and credit contributors to maintaining open standards.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Ā
Life science odin-oct2013-sa-sansone
1. ODIN āBig Bangā event, CERN, Thursday, 17 October 2013
www.slideshare.net/SusannaSansone
Data standards, sharing and publication
in the life sciences
Susanna-Assunta Sansone, PhD
Data Consultant,
Associate Director,
Honorary Academic Editor
Principal Investigator
Board of Directors
2. ODIN mission
Outline of my talk
Problem:
Identification of datasets in pivotal.
But meaningful sharing and (re)use
also depend on how well described
the datasets are.
Status quo:
In the life sciences there is a wealth
of āreporting standardsā set to
enhance and facilitate the
experimental descriptions.
Challenges:
Identify āreporting standardsā and
their organizations, track their use,
usability and impact (e.g. linking
them to datasets), credit their
developers, users (e.g. curators)...
3. My teamās activities and groups we work with
data management, biocuration and publication,
collaborative development of software, database, standards and ontology
ā¢
ā¢
ā¢
ā¢
ā¢
environmental genomics
metabolomics
metagenomics
nanotechnology
proteomics
ā¢
ā¢
ā¢
ā¢
ā¢
stem cell discovery
system biology
transcriptomics
toxicogenomics
environmental health
env
agro
tox/pharma
health
6. Growing movement for reproducible research
ļ§
Researchers
and
bioinformaticians
in
both
academic and commercial arenas, along with
funding agencies and publishers, embrace the
concept that to be comprehensible, interoperable
and reusable shared datasets we should have
richly described:
ā¢
entities of interest
e.g., genes, metabolites, phenotypes,
computational models, diseases ...
ā¢
experimental steps
e.g., provenance of study materials,
technology and measurement types,
experimentalists and curators ...
7. The necessity for well-annotated data
and unambiguous experimental
metadata was especially apparent
ā¢ during cross-study comparisons and
data analysis
ā¢ in preparation for reformatting the
datasets for submission to the
different EBI repositories, requiring
different level of information
experimental design
sample characteristic(s)
experimental variable(s)
technology(s)
measurement(s)
protocols(s)
7
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
data file(s)
8. ļ§
One must strike a balance
between
ā¢ depth and breadth of
information; and
ā¢ sufficient information
required to reuse the data
ļ§
ļ§
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Make annotation explicit
and discoverable
ļ§
8
Capture all salient features
of the experimental
workflow
Structure the descriptions
for consistency, tracking
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
9. A community mobilization to develop standards, e.g.:
Nanotechnology Working Group
de jure
standard
organizations
de facto
grass-roots
groups
ļ§ Structural and operational differences
ā¢ organization types (open, close to members, society, WG etc.)
ā¢ standards development (how to formulate, conduct and maintain)
ā¢ adoption, uptake, outreach (link to journals, funders and commercial sector)
ā¢ funds (sponsors, memberships, grants, volunteering)
10. Types of reporting standards
Nanotechnology Working Group
Including conceptual
model, conceptual
schema from which an
exchange format is
derived to allow data to
flow from one system to
another
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same āthingā
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
11. Fragmentation, duplications and gaps
epidemiology
plant biology
microbiology
Biologically-delineated
views of the world
Generic features (ācommon coreā)
- description of source biomaterial
- experimental design components
MS
Arrays
Gels
Columns
Scanning
transcriptomics
Arrays &
Scanning
proteomics
MS
Technologically-delineated
views of the world
NMR
FTIR
Columns
metabolomics
To compare and integrate data we need interoperable standards
12. Growing number of reporting standards
+ 303
To track
provenance of
the information
and ensure
richness of data
and experimental
metadata
descriptions, to
maximize
reusability
+ 150
Databases,
annotation,
curation
tools
MAGE-Tab
GCDML
AAO
SOFT
GELML
MITAB
ISA-Tab
OBI
FASTA
VO
PATO
DICOM
ENVO
XAO
DO
MIAPA
MIRIAM
MIQAS
MIX
MIGEN
MOD
SBRML
MzML
SEDMLā¦
miame
CHEBI
SRAxml
CML
Source: MIBBI,
EQUATOR
Estimated
Source: BioPortal
+ 130
TEDDY
PRO
BTO
IDOā¦
MIAPE
CIMR
MIASE
REMARK
MIQE
CONSORT
MISFISHIEā¦.
14. ā¢ A coherent, curated and searchable registry of standards for describing
and reporting experiments in life science, environmental, biomedical and
biotechnological domains
15. ā¢ A coherent, curated and searchable registry of standards for describing
and reporting experiments in life science, environmental, biomedical and
biotechnological domains
ā¢ Progressively associate standards to data policies and databases
ā¢ Develop assessment criteria for usability and popularity of standards
ā¢ Help stakeholders to make informed decisions on e.g. what standards or
databases to use or recommend; identify efforts they have funded
16. 16
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
17.
18. Will the ISNI-based ORCID affiliation module
cover standards organizations too?
19. User profiles populated from ORCID...
19
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
20. ... credit for creating, contributing to, maintaining standards
Ownership of open standards can be problematic
in broad, grass-root collaborations
20
It requires improved models, to encourage
maintenance of and contributions to these
efforts, rewards and incentives need to be
identified for all contributors to supporting the
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
continued development of standards
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
21. ... link to data records associated to publications
21
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
22. ...and associated article-level metrics
22
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
25. āInvisibleā use of standards in data reporting tools
One of the winners.
Project: integration of ORCID with
the ISAcreator, the editor tool,
helping curators and researchers to
describe experiments following
community standards.
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
26. ODIN mission
Summarizing my talk
Problem:
Identification of datasets in pivotal.
But meaningful sharing and (re)use
also depend on how well described
the datasets are.
Status quo:
In the life sciences there is a wealth
of āreporting standardsā set to
enhance and facilitate the
experimental descriptions.
Challenges addressed by
Identify āreporting standardsā and
their organizations, track their use,
usability and impact (e.g. linking
them to datasets), credit their
developers, users (e.g. curators)...