Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
What is
Reproducibility?
The R* brouhaha
(and how Research Objects
can help)
Professor Carole Goble
The University of Manc...
Acknowledgements
• Dagstuhl Seminar 16041 , January 2016
– http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041
•...
“When I use a word," Humpty Dumpty
said in rather a scornful tone, "it means
just what I choose it to mean - neither
more ...
Reproducibility of
Reproducibility Research
Computational Science
http://tpeterka.github.io/maui-project/
From:The Future of ScientificWorkflows, Report of DOEWorksho...
BioSTIF
Computational
Science
Scientific publications goals:
(i) announce a result
(ii) convince readers its correct.
Papers in experimental science
sho...
Datasets, Data collections
Standard operating procedures
Software, algorithms
Configurations,
Tools and apps, services
Cod...
10 “Simple” Rules for Reproducible
Computational Research: RACE
1. For Every Result, Keep Track of How It Was
Produced
2. ...
Preparation pain
independent testing trials and tribulations
[Norman Morrison]
replication hostility no funding, time, rec...
Lab Analogy: Witnessing “Datascopes”
Input Data
Software
Output Data
Config
Parameters
Methods
techniques, algorithms,
spe...
“Micro” Reproducibility
“Macro” Reproducibility
Fixivity
Validate
Verify
Trust
Repeat, Replicate, Robust
[CTitus Brown]
https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussio...
“an experiment is reproducible until
another laboratory tries to repeat it”
Alexander Kohn
Repeatability:
“Sameness”
Same ...
Method
Reproducibility
the provision of
enough detail about
study procedures and
data so the same
procedures could, in
the...
Productivity
Track differences
Validate
Verify
reviewers want additional work
statistician wants more runs
analysis needs to be repeated
post-doc leaves,
student arrives...
“Datascope” Lab Analogy
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorit...
“Datascope” Lab Analogy
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorit...
“Datascope” Practicalities
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algo...
T1 T2
evolving ref datasets,
new simulation codes
Environment
Archived vs Active
Contained vs Distributed
Regimented vs Fr...
Replicate harder than Reproduce?
Repeating the experiment or the set up?
Container Conundrum Results willVary
Replicabilit...
Levels of Computational Reproducibility
Coverage: how
much of an
experiment is
reproducible
OriginalExperimentSimilarExper...
Measuring Information Gain from Reproducibility
Research goal
Method/Alg.
Platform/Exec Env
Data Parameters
Input data
Act...
How? Preserve by Reporting, Reproduce by Reading
Archived
Record
Description Zoo
standards, common metadata
How? Preserve by Maintaining, Repairing, Containing
Reproduce by Running, Emulating, Reconstructing
Active Instrument Byte...
provenance
portability, preservation
robustness, versioning
access description
standards
common APIs
licensing, identifier...
Research Object
Standards-based metadata framework for logically and
physically bundling resources with context,
http://re...
Manifest
Construction
Aggregates
link things together
Annotations
about things & their
relationships
Container
Research Ob...
Systems Biology
Commons
• Link data, models
and SOPs
• Standards
• Span resources
• Snapshot + DOIs
• Bundle and export
• ...
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects,
JWeb Semantics doi:1...
Asthma Research e-Lab
Dataset building and
releasing
Standardised
packing of Systems
Biology models
European Space
Agency ...
Words matter.
Reproducibility is not a end.
Its a means to an end.
Beware reproducibility zealots.
50 Shades of Reproducib...
Bonus Slides
Prochain SlideShare
Chargement dans…5
×

What is Reproducibility? The R* brouhaha (and how Research Objects can help)

545 vues

Publié le

presented at 1st First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
http://repscience2016.research-infrastructures.eu/

Publié dans : Sciences
  • Soyez le premier à commenter

What is Reproducibility? The R* brouhaha (and how Research Objects can help)

  1. 1. What is Reproducibility? The R* brouhaha (and how Research Objects can help) Professor Carole Goble The University of Manchester, UK Software Sustainability Institute, UK ELIXIR-UK, FAIRDOMAssociation e.V. carole.goble@manchester.ac.uk First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
  2. 2. Acknowledgements • Dagstuhl Seminar 16041 , January 2016 – http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041 • ATI Symposium Reproducibility, Sustainability and Preservation , April 2016 – https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/ – https://osf.io/bcef5/files/ • CTitus Brown • Juliana Freire • David De Roure • Stian Soiland-Reyes • Barend Mons • Tim Clark • Daniel Garijo • Norman Morrison
  3. 3. “When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.” Carroll, Through the Looking Glass re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle redo robustness tolerance verificationcompliancevalidation assurance remix
  4. 4. Reproducibility of Reproducibility Research
  5. 5. Computational Science http://tpeterka.github.io/maui-project/ From:The Future of ScientificWorkflows, Report of DOEWorkshop 2015, http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd 1. Observational, experimental 2. Theoretical 3. Simulation 4. Data intensive
  6. 6. BioSTIF Computational Science
  7. 7. Scientific publications goals: (i) announce a result (ii) convince readers its correct. Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension. Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures. VirtualWitnessing* *Leviathan and theAir-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer. Jill Mesirov David Donoho
  8. 8. Datasets, Data collections Standard operating procedures Software, algorithms Configurations, Tools and apps, services Codes, code libraries Workflows, scripts System software Infrastructure Compilers, hardware Systems of Systems Heterogeneous hybrid patchwork of tools and service evolving over time
  9. 9. 10 “Simple” Rules for Reproducible Computational Research: RACE 1. For Every Result, Keep Track of How It Was Produced 2. Avoid Manual Data Manipulation Steps 3. Archive the Exact Versions of All External Programs Used 4. Version Control All Custom Scripts 5. Record All Intermediate Results, When Possible in Standardized Formats 6. For Analyses That Include Randomness, Note Underlying Random Seeds 7. Always Store Raw Data behind Plots 8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected 9. Connect Textual Statements to Underlying Results 10. Provide Public Access to Scripts, Runs, and Results Sandve GK, Nekrutenko A,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285 Record Everything Automate Everything Contain Everything Expose Everything
  10. 10. Preparation pain independent testing trials and tribulations [Norman Morrison] replication hostility no funding, time, recognition, place to publish resource intensive access to the complete environment
  11. 11. Lab Analogy: Witnessing “Datascopes” Input Data Software Output Data Config Parameters Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, , ref resources Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment
  12. 12. “Micro” Reproducibility “Macro” Reproducibility Fixivity Validate Verify Trust
  13. 13. Repeat, Replicate, Robust [CTitus Brown] https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html Why the differences? Reproduce,Trust
  14. 14. “an experiment is reproducible until another laboratory tries to repeat it” Alexander Kohn Repeatability: “Sameness” Same result 1 Lab 1 experiment Reproducibility: “Similarity” Similar result > 1 Lab > 1 experiment Validate Verify
  15. 15. Method Reproducibility the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated. Result Reproducibility (aka replicability) obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible What does research reproducibility mean? Steven N. Goodman, Daniele Fanelli, John P. A. Ioannidis ScienceTranslational Medicine 8 (341), 341ps12. [doi: 10.1126/scitranslmed.aaf5027] http://stm.sciencemag.org/content/scitransmed/8/341/341ps12.full.pdf
  16. 16. Productivity Track differences Validate Verify
  17. 17. reviewers want additional work statistician wants more runs analysis needs to be repeated post-doc leaves, student arrives new/revised datasets updated/new versions of algorithms/codes sample was contaminated better kit - longer simulations new partners, new projects Personal & Lab Productivity Public Good Reproducibility
  18. 18. “Datascope” Lab Analogy Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, ref datasets Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment
  19. 19. “Datascope” Lab Analogy Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, ref datasets Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment Form Function
  20. 20. “Datascope” Practicalities Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, ref datasets Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment Living Dependencies Science, methods, datasets questions stay, answers change breakage, labs decay, services and techniques come and go, new instruments, updated datasets, services, codes, hardware One offs, streams, stochastics, sensitivities, scale, non-portable data black boxes supercomputer access non-portable software licensing restrictions unreliable resources black boxes complexity
  21. 21. T1 T2 evolving ref datasets, new simulation codes Environment Archived vs Active Contained vs Distributed Regimented vs Free-for-all Who owns the dependencies? Dependencies -> Manage Black boxes -> Expose Dynamics -> Fixity Reliability
  22. 22. Replicate harder than Reproduce? Repeating the experiment or the set up? Container Conundrum Results willVary ReplicabilityWindow All experiments become less replicable over time Prepare to repair
  23. 23. Levels of Computational Reproducibility Coverage: how much of an experiment is reproducible OriginalExperimentSimilarExperimentDifferentExperiment Portability Depth: how much of an experiment is available Binaries + Data Source Code / Workflow + Data Binaries + Data + Dependencies Source Code / Workflow + Data + Dependencies Virtual Machine Binaries + Data + Dependencies Virtual Machine Source Code / Workflow + Data + Dependencies Figures + Data [Freire, 2014] Minimum: data and source code available under terms that permit inspection and execution.
  24. 24. Measuring Information Gain from Reproducibility Research goal Method/Alg. Platform/Exec Env Data Parameters Input data Actors Information Gain Implementation/Code No change Change Don’t care https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/ http://www.dagstuhl.de/16041
  25. 25. How? Preserve by Reporting, Reproduce by Reading Archived Record Description Zoo standards, common metadata
  26. 26. How? Preserve by Maintaining, Repairing, Containing Reproduce by Running, Emulating, Reconstructing Active Instrument Byte level Buildability Zoo
  27. 27. provenance portability, preservation robustness, versioning access description standards common APIs licensing, identifiers standards, common metadata change variation sensitivity discrepancy handling packaging, containers FAIR RACE Reproducibility Dimensions dependencies steps
  28. 28. Research Object Standards-based metadata framework for logically and physically bundling resources with context, http://researchobject.org Bigger on the inside than the outside external referencing
  29. 29. Manifest Construction Aggregates link things together Annotations about things & their relationships Container Research Object Standards-based metadata framework for logically and physically bundling resources with context, http://researchobject.org Packaging content & links: Zip files, BagIt, Docker images Catalogues & Commons Platforms: FAIRDOM Manifest Description Dependencies what else is needed Versioning its evolution Checklists what should be there Provenance where it came from Identification locate things regardless where id
  30. 30. Systems Biology Commons • Link data, models and SOPs • Standards • Span resources • Snapshot + DOIs • Bundle and export • Logical bundles
  31. 31. Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, JWeb Semantics doi:10.1016/j.websem.2015.01.003 application/vnd.wf4ever.robundle+zip Workflow Research Objects exchange, portability and maintenance *https://2016-oslo-repeatability.readthedocs.org/en/latest/overview-and-agenda.html
  32. 32. Asthma Research e-Lab Dataset building and releasing Standardised packing of Systems Biology models European Space Agency RO Library Large dataset management for life science workflows LHC ATLAS experiments Notre Dame U Rostock Encyclopedia of DNA Elements PeptideAtlas
  33. 33. Words matter. Reproducibility is not a end. Its a means to an end. Beware reproducibility zealots. 50 Shades of Reproducibility. form vs function A conundrum: big co-operative data-driven science makes reproducibility desirable but also means dependency and change are to be expected. Lab analogy for computational science
  34. 34. Bonus Slides

×