SlideShare une entreprise Scribd logo
1  sur  19
Reproducibility:
10 Simple Rules
And more!
Sandve, Geir Kjetil, et al. "Ten simple rules for reproducible computational research." PLoS computational biology 9.10 (2013): e1003285.
Rule 1: For Every Result, Keep Track of
How It Was Produced
http://xkcd.com/
Rule 2: Avoid Manual Data
Manipulation Steps
• “Stop clicking, start typing” – Matt Frost,
Charlottesville, VA
• Use scripts for even small changes
• Split commonly used code off into
functions/classes, and put these into libraries
Rule 3: Archive the Exact Versions of
All External Programs Used
Level
0
Note names and versions
of all packages
Level
1
Use package management
system (packrat,
anaconda/conda)
Boss
Level
Save image of entire
system
Rule 4: Version Control All Custom
Scripts
http://www.slideshare.net/sjcockell/reproducibility-the-myths-and-truths-of-pipeline-bioinformatics
• Also, version control workflows (what are
good workflow management systems, guys?)
• Use the commit
space to write
something useful to
your future self
(“pwew pwew pwew”
is not useful)
Rule 5: Record All Intermediate
Results, When Possible in Standardized
Formats
• “Explicit is better than implicit” – Tim Peters,
The Zen of Python
Rule 6: For Analyses That Include
Randomness, Note Underlying
Random Seeds
• This goes for all parameters that may change
• Separate code from configuration, e.g. use
config files (another gift to your future self!)
Rule 7: Always Store Raw Data behind
Plots
• (and the plot generating code, too)
• Make raw data read only
• Separate folders for raw and pre-processed
data
https://inspguilfoyle.wordpress.com/2014/02/19/straight-lines/
Rule 8: Generate Hierarchical Analysis
Output, Allowing Layers of Increasing
Detail to Be Inspected
Rule 9: Connect Textual Statements to
Underlying Results
Rule 10: Provide Public Access to
Scripts, Runs, and Results
• GitHub
• Synapse
• Open Science Framework
• ReadTheDocs
• RunMyCode
• ???
Documentation
 Is it clear where to begin? (e.g., can someone picking a project up
see where to start running it)
 can you determine which file(s) was/were used as input in a process
that produced a derived file?
 Who do I cite? (code, data, etc.)
 Is there documentation about every result?
 Have you noted the exact version of every external application used
in the process?
 For analyses that include randomness, have you noted the
underlying random seed(s)?
 Have you specified the license under which you're distributing your
content, data, and code?
 Have you noted the license(s) for others peoples' content, data, and
code used in your analysis?
http://ropensci.github.io/reproducibility-guide/sections/checklist/
Organization
 Which is the most recent data file/code?
 Which folders can I safely delete?
 Do you keep older files/code or delete them?
 Can you find a file for a particular replicate of your research
project?
 Have you stored the raw data behind each plot? Is your analysis
output done hierarchically? (allowing others to find more detailed
output underneath a summary)
 Do you run backups on all files associated with your analysis?
 How many times has a particular file been generated in the past?
 Why was the same file generated multiple times?
 Where did a file that I didn't generate come from?
http://ropensci.github.io/reproducibility-guide/sections/checklist/
Automation
Are there lots of manual data manipulation steps are
there?
Are all custom scripts under version control?
Is your writing (content) under version control?
http://ropensci.github.io/reproducibility-guide/sections/checklist/
Publication
Have you archived the exact version of every external
application used in your process(es)?
Did you include a reproducibility statement or
declaration at the end of your paper(s)?
Are textual statements connected/linked to the
supporting results or data?
Did you archived preprints of resulting papers in a
public repository?
Did you release the underlying code at the time of
publishing a paper?
Are you providing public access to your scripts, runs,
and results?
http://ropensci.github.io/reproducibility-guide/sections/checklist/
Best Practices for Scientific Computing
Write programs for people, not computers.
Let the computer do the work.
Make incremental changes.
DRY: Don’t repeat yourself (or others).
Plan for mistakes. (“Defensive Programming”)
Use pair programming.
Wilson, Greg, et al. "Best practices for scientific computing." PLoS biology 12.1 (2014): e1001745.
Wilson, Greg, et al. "Best practices for scientific computing." PLoS biology 12.1 (2014): e1001745.
Document design and purpose, not mechanics.
Suggested Training Topics
• version control and use of online repositories
• modern programming practice including unit testing and regression
testing
• maintaining “notebooks” or “research compendia”
• recording the provenance of final results relative to code and/or data
• numerical / floating point reproducibility and nondeterminism
• reproducibility on parallel systems
• dealing with large datasets
• dealing with complicated software stacks and use of virtual machines
• documentation and literate programming
• IP and licensing issues, proper citation and attribution
http://icerm.brown.edu/tw12-5-rcem/
Resources
• http://projecttemplate.net/ - Project automation (R)
• http://www.nature.com/news/2010/101013/full/4677
53a.html - Publish your computer code: it is good
enough
• http://www.carlboettiger.info/ - Open lab notebook
• http://wiki.stodden.net/ICERM_Reproducibility_in_Co
mputational_and_Experimental_Mathematics:_Readin
gs_and_References
• http://rrcns.readthedocs.org/ - Best practices tutorial
• http://www.bioinformaticszen.com/

Contenu connexe

Tendances

Avoiding the tower of babel - The Role of Data Description Standards in Biome...
Avoiding the tower of babel - The Role of Data Description Standards in Biome...Avoiding the tower of babel - The Role of Data Description Standards in Biome...
Avoiding the tower of babel - The Role of Data Description Standards in Biome...Krzysztof Gorgolewski
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-reviewc.titus.brown
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)Stian Soiland-Reyes
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data lossIUPUI
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Ola Spjuth
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformaticianChristian Frech
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research ObjectsDavid De Roure
 
Useful Shareware / Freeware for Technical Communicators
Useful Shareware / Freeware for Technical CommunicatorsUseful Shareware / Freeware for Technical Communicators
Useful Shareware / Freeware for Technical CommunicatorsSTC-Philadelphia Metro Chapter
 
Modern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging resultsModern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging resultsKrzysztof Gorgolewski
 
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...John Fonner
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsOla Spjuth
 
2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverviewYannick Wurm
 
Ten Tools for Security Professionals
Ten Tools for Security ProfessionalsTen Tools for Security Professionals
Ten Tools for Security ProfessionalsMcGrewSecurity
 

Tendances (20)

Avoiding the tower of babel - The Role of Data Description Standards in Biome...
Avoiding the tower of babel - The Role of Data Description Standards in Biome...Avoiding the tower of babel - The Role of Data Description Standards in Biome...
Avoiding the tower of babel - The Role of Data Description Standards in Biome...
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
How to be a bioinformatician
How to be a bioinformaticianHow to be a bioinformatician
How to be a bioinformatician
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Useful Shareware / Freeware for Technical Communicators
Useful Shareware / Freeware for Technical CommunicatorsUseful Shareware / Freeware for Technical Communicators
Useful Shareware / Freeware for Technical Communicators
 
Modern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging resultsModern tools for sharing and synthesizing neuroimaging results
Modern tools for sharing and synthesizing neuroimaging results
 
RedPen, a document checker
RedPen, a document checkerRedPen, a document checker
RedPen, a document checker
 
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...
Jupyter Ascending: a practical hand guide to galactic scale, reproducible dat...
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview
 
Ten Tools for Security Professionals
Ten Tools for Security ProfessionalsTen Tools for Security Professionals
Ten Tools for Security Professionals
 
Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?Why should Journals ask fo RRIDs?
Why should Journals ask fo RRIDs?
 

En vedette

Science in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific NorthwestScience in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific NorthwestCameron Neylon
 
Columbia Talk on Open Notebook Science
Columbia Talk on Open Notebook ScienceColumbia Talk on Open Notebook Science
Columbia Talk on Open Notebook ScienceJean-Claude Bradley
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open ScienceKaitlin Thaney
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open SciencePhilip Bourne
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamPlatforma Otwartej Nauki
 
Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020 Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020 Reme Melero
 
Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts'; Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts'; Rene Von schomberg
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open scienceReme Melero
 
What is Open Science and what role does it play in Development?
What is Open Science and what role does it play in Development?What is Open Science and what role does it play in Development?
What is Open Science and what role does it play in Development?Leslie Chan
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open ScienceMike Travers
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Sciencepetermurrayrust
 
Winning research proposals with open science
Winning research proposals with open scienceWinning research proposals with open science
Winning research proposals with open scienceIvo Grigorov
 
Scholarly publishing in the context of open science
Scholarly publishing in the context of open scienceScholarly publishing in the context of open science
Scholarly publishing in the context of open scienceRudjer Boskovic Institute
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Sciencepetermurrayrust
 
Open Science at the European Commission
Open Science at the European CommissionOpen Science at the European Commission
Open Science at the European CommissionCarl-Christian Buhr
 
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)judan1970
 

En vedette (20)

Science in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific NorthwestScience in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific Northwest
 
Columbia Talk on Open Notebook Science
Columbia Talk on Open Notebook ScienceColumbia Talk on Open Notebook Science
Columbia Talk on Open Notebook Science
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open Science
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020 Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020
 
Relationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social MediaRelationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social Media
 
Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts'; Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts';
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open science
 
Open science
Open scienceOpen science
Open science
 
What is Open Science and what role does it play in Development?
What is Open Science and what role does it play in Development?What is Open Science and what role does it play in Development?
What is Open Science and what role does it play in Development?
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open Science
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
Open Science: What, why, how?
Open Science: What, why, how? Open Science: What, why, how?
Open Science: What, why, how?
 
Winning research proposals with open science
Winning research proposals with open scienceWinning research proposals with open science
Winning research proposals with open science
 
Scholarly publishing in the context of open science
Scholarly publishing in the context of open scienceScholarly publishing in the context of open science
Scholarly publishing in the context of open science
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
Open Science in a European Perspective
Open Science in a European PerspectiveOpen Science in a European Perspective
Open Science in a European Perspective
 
Open Science at the European Commission
Open Science at the European CommissionOpen Science at the European Commission
Open Science at the European Commission
 
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
 

Similaire à Reproducibility: 10 Simple Rules

Writting Better Software
Writting Better SoftwareWritting Better Software
Writting Better Softwaresvilen.ivanov
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsopen_phacts
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
Introduction to r
Introduction to rIntroduction to r
Introduction to rgslicraf
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Enkitec
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 
Sustainability Training Workshop - Managing Sustainability into Software
Sustainability Training Workshop - Managing Sustainability into SoftwareSustainability Training Workshop - Managing Sustainability into Software
Sustainability Training Workshop - Managing Sustainability into SoftwareSoftware Sustainability Institute
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 

Similaire à Reproducibility: 10 Simple Rules (20)

Writting Better Software
Writting Better SoftwareWritting Better Software
Writting Better Software
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow tools
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
01.intro
01.intro01.intro
01.intro
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Introduction
IntroductionIntroduction
Introduction
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...Resilience Engineering: A field of study, a community, and some perspective s...
Resilience Engineering: A field of study, a community, and some perspective s...
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12Kelly potvin nosurprises_odtug_oow12
Kelly potvin nosurprises_odtug_oow12
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Sustainability Training Workshop - Managing Sustainability into Software
Sustainability Training Workshop - Managing Sustainability into SoftwareSustainability Training Workshop - Managing Sustainability into Software
Sustainability Training Workshop - Managing Sustainability into Software
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 

Dernier

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 

Dernier (20)

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 

Reproducibility: 10 Simple Rules

  • 1. Reproducibility: 10 Simple Rules And more! Sandve, Geir Kjetil, et al. "Ten simple rules for reproducible computational research." PLoS computational biology 9.10 (2013): e1003285.
  • 2. Rule 1: For Every Result, Keep Track of How It Was Produced http://xkcd.com/
  • 3. Rule 2: Avoid Manual Data Manipulation Steps • “Stop clicking, start typing” – Matt Frost, Charlottesville, VA • Use scripts for even small changes • Split commonly used code off into functions/classes, and put these into libraries
  • 4. Rule 3: Archive the Exact Versions of All External Programs Used Level 0 Note names and versions of all packages Level 1 Use package management system (packrat, anaconda/conda) Boss Level Save image of entire system
  • 5. Rule 4: Version Control All Custom Scripts http://www.slideshare.net/sjcockell/reproducibility-the-myths-and-truths-of-pipeline-bioinformatics • Also, version control workflows (what are good workflow management systems, guys?) • Use the commit space to write something useful to your future self (“pwew pwew pwew” is not useful)
  • 6. Rule 5: Record All Intermediate Results, When Possible in Standardized Formats • “Explicit is better than implicit” – Tim Peters, The Zen of Python
  • 7. Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds • This goes for all parameters that may change • Separate code from configuration, e.g. use config files (another gift to your future self!)
  • 8. Rule 7: Always Store Raw Data behind Plots • (and the plot generating code, too) • Make raw data read only • Separate folders for raw and pre-processed data https://inspguilfoyle.wordpress.com/2014/02/19/straight-lines/
  • 9. Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
  • 10. Rule 9: Connect Textual Statements to Underlying Results
  • 11. Rule 10: Provide Public Access to Scripts, Runs, and Results • GitHub • Synapse • Open Science Framework • ReadTheDocs • RunMyCode • ???
  • 12. Documentation  Is it clear where to begin? (e.g., can someone picking a project up see where to start running it)  can you determine which file(s) was/were used as input in a process that produced a derived file?  Who do I cite? (code, data, etc.)  Is there documentation about every result?  Have you noted the exact version of every external application used in the process?  For analyses that include randomness, have you noted the underlying random seed(s)?  Have you specified the license under which you're distributing your content, data, and code?  Have you noted the license(s) for others peoples' content, data, and code used in your analysis? http://ropensci.github.io/reproducibility-guide/sections/checklist/
  • 13. Organization  Which is the most recent data file/code?  Which folders can I safely delete?  Do you keep older files/code or delete them?  Can you find a file for a particular replicate of your research project?  Have you stored the raw data behind each plot? Is your analysis output done hierarchically? (allowing others to find more detailed output underneath a summary)  Do you run backups on all files associated with your analysis?  How many times has a particular file been generated in the past?  Why was the same file generated multiple times?  Where did a file that I didn't generate come from? http://ropensci.github.io/reproducibility-guide/sections/checklist/
  • 14. Automation Are there lots of manual data manipulation steps are there? Are all custom scripts under version control? Is your writing (content) under version control? http://ropensci.github.io/reproducibility-guide/sections/checklist/
  • 15. Publication Have you archived the exact version of every external application used in your process(es)? Did you include a reproducibility statement or declaration at the end of your paper(s)? Are textual statements connected/linked to the supporting results or data? Did you archived preprints of resulting papers in a public repository? Did you release the underlying code at the time of publishing a paper? Are you providing public access to your scripts, runs, and results? http://ropensci.github.io/reproducibility-guide/sections/checklist/
  • 16. Best Practices for Scientific Computing Write programs for people, not computers. Let the computer do the work. Make incremental changes. DRY: Don’t repeat yourself (or others). Plan for mistakes. (“Defensive Programming”) Use pair programming. Wilson, Greg, et al. "Best practices for scientific computing." PLoS biology 12.1 (2014): e1001745.
  • 17. Wilson, Greg, et al. "Best practices for scientific computing." PLoS biology 12.1 (2014): e1001745. Document design and purpose, not mechanics.
  • 18. Suggested Training Topics • version control and use of online repositories • modern programming practice including unit testing and regression testing • maintaining “notebooks” or “research compendia” • recording the provenance of final results relative to code and/or data • numerical / floating point reproducibility and nondeterminism • reproducibility on parallel systems • dealing with large datasets • dealing with complicated software stacks and use of virtual machines • documentation and literate programming • IP and licensing issues, proper citation and attribution http://icerm.brown.edu/tw12-5-rcem/
  • 19. Resources • http://projecttemplate.net/ - Project automation (R) • http://www.nature.com/news/2010/101013/full/4677 53a.html - Publish your computer code: it is good enough • http://www.carlboettiger.info/ - Open lab notebook • http://wiki.stodden.net/ICERM_Reproducibility_in_Co mputational_and_Experimental_Mathematics:_Readin gs_and_References • http://rrcns.readthedocs.org/ - Best practices tutorial • http://www.bioinformaticszen.com/