The document discusses how research objects and computational workflows can help capture experimental processes and reproduce findings in life sciences research. It describes a computational experiment evaluating three genome assembly algorithms on bacterial, insect, and human genomes. Key steps included identifying resources, designing the experimental workflow, running the experiment in Galaxy, and publishing results as nanopublications aggregated in a research object to enable verification and reuse. The goal is to improve reproducibility by making experimental descriptions and reviews more structured and transparent.
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
From peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the life sciences
1. From peer-reviewed to peer-reproduced:
a role for research objects in scholarly
publishing in the life
sciences
Alejandra González-Beltrán
Oxford e-Research Centre, University of Oxford
-ontology.org
Bioinformatics Open Source Conference (BOSC), Dublin, Ireland
July 10-11 2015
2.
3. "AGBell Notebook" by Alexander Graham Bell. (d. 1922) -
page 40-41 of Alexander Graham Bell Family Papers in the Library of Congress' Manuscript Division.
Licensed under Public Domain via Wikimedia Commons
- http://commons.wikimedia.org/wiki/File:AGBell_Notebook.jpg#/media/File:AGBell_Notebook.jpg
http://petcaretips.net/bonding-rabbit-to-pets.html
Many things have been said about
the challenges of
science reproducibility
and how it can go wrong…
Difficulties when the description
of the experimental steps
is only available in
lab notebooks and scientific articles;
lack of data,
lack of software tools
required for analysis
4. Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
experimental
description
(design & steps)
conclusions
computational
workflows
aggregation & workflow preservation
5. Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
6. Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
7. Can data models and computational workflows help in
capturing the experimental processes and reproduce findings?
How?
8. • open peer-review
• availability of
• data
• analysis scripts
• documentation
Evaluation of SOAPdenovo2 tool for the de novo assembly of genomes from small DNA segments
reads by next generation sequencing, implementing improvements over SOAPdenovo1 assembler.
pre-publication history
https://github.com/aquaskyline/SOAPdenovo2
http://sourceforge.net/projects/soapdenovo2/
15. genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables
(Factor Name, Factor Type)
The experimental plan - computational case
Response Variables
(with units)
genome coverage (%)
computation run time (h)
peak memory consumption (Gb)
contig N50 (kb or bp)
scaffold N50 (kb or bp)
number of errors
16. The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers
if available (ORCIDs, DOIs); we suggest a dedicated article section
Experimental workflows - identification of processes, their inputs and outputs
Experimental design: identify experimental goal, independent and response variables
17. The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers
if available (ORCIDs, DOIs); dedicated article section
Experimental workflows - identification of processes, their inputs and outputs
Experimental design: identify experimental goal, independent and response variables
21. Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured data along with its
provenance in a single publishable and citable entity
22. Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured data along with its
provenance in a single publishable and citable entity
Abstract & Conclusions
assertion provenance
Generation of nanopublications for all the results of the
response variables
NanoMaton
templates for nanopublications
Prevent priming; report all findings corresponding to the identified
response variables
Remain neutral and report all findings of similar
importance with the same weight
23. “genome coverage increased
over the human data when
comparing SOAPdenovo2
against SOAPdenovo1”
Link conclusions
to
experimental
description
24. http://www.researchobject.org/
Aggregation and workflow preservation as
ResearchObject: enables the aggregation of the digital
resources contributing to findings of computational
research, including results, data and software, as citable
compound digital objects
26. From narrative to self-described structured data
Model & workflow assisted experimental description and review process
Depth and breadth of semantic resources, clear meaning of experimental
elements
27. Ruibang Luo, University of Hong Kong
Tin-Lap Lee, Chinese University of Hong Kong
Tak-wah Lam, University of Hong Kong
SOAPdenovo2
Scott Edmunds, GigaScience
Peter Li, GigaScience
Marco Roos, Leiden University
Mark Thompson, Leiden University
Rajaram Kaliyaperumal, Leiden University
Eelke van der Horst, Leiden University
Jun Zhao, Lancaster University
María Susana Avila García,
Oxford University
Philippe Rocca-Serra, Oxford University
Susanna-Assunta Sansone, Oxford University
Alejandra Gonzalez-Beltran, Oxford University
Team
28. Questions?
You can email us...
isatools@googlegroups.com
View our blog
http://isatools.wordpress.com
Follow us onTwitter
@isatools
View our websites
View our Git repo & contribute
http://github.com/ISA-tools
Thanks for your attention!