2. Reproducibility
is the practice of distributing all data,
software source code, and tools required
to reproduce the results discussed in a
research publication.
https://www.ctspedia.org/do/view/CTSpedia/ReproducibleResearchStandards
3. Replication vs. Reproducibility
• Replication: The confirmation of results and conclusions from one study
obtained independently in another is considered the scientific gold standard.
• “Again, and Again, and Again …” BR Jasny et. al. Science, 2011. 334(6060) pp. 1225 DOI: 10.1126/science.334.6060.1225
• Some studies can’t be replicated: too big, too costly, too time consuming, one
time event, rare samples
• Reproducibility: minimum standard for assessing the value of scientific claims,
particularly when full independent replication of a study is not feasible
• “Reproducible Research in Computational Science”. RD Peng Science, 2011. 334 (6060) pp. 1226-1227 DOI: 10.1126/science.1213847
6. Requires new expertise and infrastructure
Form
Hypothesis
Collect
Data
Design
Experiment
Publish
research
Clean
Data
Analyze
Data
Write
manuscript
Share
data
Curate
data
Plan for data
storage
Data
Management
Plans
Version
control
Literate
Statistical
Computing
Reproducible
research
tools
7. DMPTool
• Developed by California Digital Libraries to help researchers write
data management plans
• https://dmptool.org/user_sessions/institution
• Select University of Colorado Anschutz Medical Campus
8. Create an account* or signin
*We’re working with OIT to allow us to log in with CU passport credentials. Stay tuned
10. Data management exercise
• Create a DMPTool account
• Pick a template and create a DMP
• Take 5 minutes to click through the template and think about how
these questions relate to your research
11. Version control
Version control is a system that records changes to a file or set of files
over time so that you can recall specific versions later.
https://git-scm.com/doc
13. Local version control system
Figure 1-1. Local version control.
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
But what if you
need to collaborate?
• Keeps files in one place
• No copies
• Keeps track of changes
• Like Apple’s Time machine
16. What is Git?
• Distributed version control system developed by the Linux community
• A stream of snapshots
Figure 1-5. Storing data as snapshots of the project over time.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
17. 3 states of repository files
• Modified – the file is altered but not committed
• Staged – the file is altered and marked to go to the next commit
• Committed- the file is altered and stored in your local DB
18. 3 Sections of your directory
Figure 1-6. Working directory, staging area, and Git directory.
https://git-scm.com/book/en/v2/Getting-Started-Git-Basics
Committed
Modified
Staged
19. Important git commands
• Init (Initialize) – start a git repository
• Add – add files to the git repository (for initial add and staging), can
be skipped with –a command
• Commit – safely store the files in your git repository
• Clone – make a copy of someone else’s git repository
20. File statuses and how they change
Figure 2-1. The lifecycle of the status of your files.
https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
27. Cloning/Branching/Forking
• Cloning: make a local copy of a repository online or elsewhere
• Branching: creating a separate stream to test new features, so you
don’t affect the “trunk”; branches depend on the trunk
• Collaboration
• Forking: Making a separate copy of a repository that is not dependent
• Using others’ work is a starting point; preserving things that the owner might
delete for yourself
33. Exercise
• Go to the repository you cloned earlier
• Create a text file with your name on it
• Add it to the name folder
• Submit a pull request
• Look at what happens to the visual representation
34. Literate (statistical) programming
• Resulting report is a stream of text (human readable) and code
(machine readable)
• Alternate text and code
• Sweave
• R markdown
35. R Markdown
• Open
• Write
• Embed
• Render
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
36. Install knitr and markdown packages
• Tools > install packages
• Enter the package name (will autocomplete)
• Knitr
• Markdown
• OR install.packages("knitr”)
• If it fails, try again
38. Write: useful syntax
• Plain text
• *italics* -> italics
• **bold** -> bold
• #Header -> Header (more # decreases size)
• Can also draw:
• Insert pictures
• Ordered and unordered list
• Tables
39. Embed code
• Inline – Use variables in the human readable text
• `r 2 + 2`
• Code chunks - Include working code that generates output
• ```{r}
• #Code goes here
• ```
• Display Options –
40. Render
• Won’t render unless the code runs with no errors
• You know it should be reproducible
• Render using the knit function
• Output Formats
• Knit HTML
• Knit PDF – requires latex
• Knit Word
41. Exercise
• Edit the markdown document using the cheat sheet to see what you
can do
• Try to knit it after creating a typo in the code
• Insert other pictures from the web
• Try to make a table
• Make some bulleted lists
• Insert a block quote
• Make the graph prettier
• Play around!
Notes de l'éditeur
What issues do you see with the feasibility of this process?
These services span the research data lifecycle
Plan what you’re going to do with your data before you generate it
Curate and manage during collection
Temporary storage
Prepare for long term storage
Sharing optional (for now)
These services span the research data lifecycle
Plan what you’re going to do with your data before you generate it
Curate and manage during collection
Temporary storage
Prepare for long term storage
Sharing optional (for now)
Expertise and infrastructure
These services span the research data lifecycle
Plan what you’re going to do with your data before you generate it
Curate and manage during collection
Temporary storage
Prepare for long term storage
Sharing optional (for now)
Expertise and infrastructure