Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Reproducible computational
research in R
An introduction by Samuel Bosch (October 2015)
http://samuelbosch.com
Topics
– Introduction
– Version control (Git)
– Reproducible analysis in R
• Writing packages
• R Markdown
• Saving plots
...
Reproducible (computational) research
1. For Every Result, Keep Track of How It Was Produced
– Steps, commands, clicks
2. ...
Version control
• Word review on steroids
• When working alone: it’s a database of all the versions of
your files
• When c...
Git
• Popularized by http://github.com but
supported by different providers
(http://github.ugent.be, http://bitbucket.org)...
Git workflow (1 user)
Workflow:
1. create a repository on your preferred provider
If you want a private repository then us...
Git extras to explore
• Excluding files from Git with .gitignore
• Contributing to open source
– Forking
– Pull requests
DEMO
• New project on https://github.ugent.be/
• Clone
• Add file
• Status
• Commit
• Edit file
• Commit
• Push
R general
• Use Rstudio
https://www.rstudio.com/products/rstudio/down
load/ and explore it
– Projects
– Keyboard shortcuts...
R package development
• R packages by Hadley Wickham (http://r-
pkgs.had.co.nz/)
• Advantages:
– Can be shared easily
– On...
R packages: Getting started
• install.packages(“devtools”)
• Rstudio -> new project -> new directory -> R
package
• # Buil...
R packages: testing
• Test if your functions returns the expected results
• Gives confidence in the correctness of your co...
R Markdown
• Easy creation of dynamic documents
– Mix of R and markdown
– Output to word, html or pdf
– Integrates nicely ...
R Markdown: example
---
title: "Numbers and their values"
output:
word_document:
fig_caption: yes
---
```{r, echo=FALSE, w...
Markdown basics
Headers
# Heading level 1
## Heading level 2
###### Heading level 6
*italic* and is _this is also italic_
...
Caching intermediate results
Official way: http://yihui.name/knitr/demo/cache/
Hand rolled (more explicit, but doesn’t cle...
Saving plots
save_plot <- function(filename, plotfn, outdir = "D:/temp/", ...) {
height<-498
width<-662
invisible(capture....
Saving tables
• As html
stargazer(data, type = "html", summary = FALSE, out
= outputpath , out.header = T)
• As csv
write....
Packrat
Use packrat to make your R projects more:
• Isolated: Installing a new or updated package for one
project won’t br...
Packrat
Rstudio:
Project support for Packrat on creation of a project or it can be
enabled in the project settings
Manuall...
DEMO
• Package development (new, existing)
• Rmarkdown (new, existing)
• Packrat (new and existing project)
– packrat::ini...
Learning More
https://software-carpentry.org/
Lessons on using the (Linux) shell, Git, Mercurial,
Databases & SQL, Python,...
Reproducible Computational Research in R
Reproducible Computational Research in R
Reproducible Computational Research in R
Prochain SlideShare
Chargement dans…5
×

Reproducible Computational Research in R

227 vues

Publié le

A short presentation with pointers on getting started with reproducible computational research in R. Some of the topics include git, R package development, document generation with R markdown, saving plots, saving tables and using packrat.

Publié dans : Données & analyses
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Reproducible Computational Research in R

  1. 1. Reproducible computational research in R An introduction by Samuel Bosch (October 2015) http://samuelbosch.com
  2. 2. Topics – Introduction – Version control (Git) – Reproducible analysis in R • Writing packages • R Markdown • Saving plots • Saving data • Packrat
  3. 3. Reproducible (computational) research 1. For Every Result, Keep Track of How It Was Produced – Steps, commands, clicks 2. Avoid Manual Data Manipulation Steps 3. Archive the Exact Versions of All External Programs Used – Packrat (Reproducible package management for R) 4. Version Control All Custom Scripts 5. Record All Intermediate Results, When Possible in Standardized Formats 6. For Analyses That Include Randomness, Note Underlying Random Seeds – set.seed(42) 7. Always Store Raw Data behind Plots 8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected 9. Connect Textual Statements to Underlying Results 10. Provide Public Access to Scripts, Runs, and Results Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
  4. 4. Version control • Word review on steroids • When working alone: it’s a database of all the versions of your files • When collaborating: it’s a database of all the versions of all collaborators with one master version where all changes can be merged into. • When there are no conflicts then merging can be done automatically. • Multiple programs/protocols: git, mercurial, svn, … • By default not for versioning large files (> 50 mb) but there is a Git Large File Storage extension • Works best with text files (code, markdown, csv, …)
  5. 5. Git • Popularized by http://github.com but supported by different providers (http://github.ugent.be, http://bitbucket.org). • Programs for Git on windows: – Standard Git Gui + command line (git-scm.com) – GitHub Desktop for Windows – Atlassian SourceTree
  6. 6. Git workflow (1 user) Workflow: 1. create a repository on your preferred provider If you want a private repository then use bitbucket.org or apply for the student developer pack (https://education.github.com/) 2. Clone the repository to your computer git clone https://github.com/samuelbosch/sdmpredictors.git 3. Make changes 4. View changes (optional) git status 5. Submit changes git add git commit -am “” git push
  7. 7. Git extras to explore • Excluding files from Git with .gitignore • Contributing to open source – Forking – Pull requests
  8. 8. DEMO • New project on https://github.ugent.be/ • Clone • Add file • Status • Commit • Edit file • Commit • Push
  9. 9. R general • Use Rstudio https://www.rstudio.com/products/rstudio/down load/ and explore it – Projects – Keyboard shortcuts – Git integration – Package development – R markdown • R Short Reference Card: https://cran.r- project.org/doc/contrib/Short-refcard.pdf • Style guide: http://adv-r.had.co.nz/Style.html
  10. 10. R package development • R packages by Hadley Wickham (http://r- pkgs.had.co.nz/) • Advantages: – Can be shared easily – One package with your data and your code – Documentation (if you write it) – Ease of testing
  11. 11. R packages: Getting started • install.packages(“devtools”) • Rstudio -> new project -> new directory -> R package • # Build and Reload Package: 'Ctrl + Shift + B' • # Check Package: 'Ctrl + Shift + E' • # Test Package: 'Ctrl + Shift + T' • # Build documentation: 'Ctrl + Shift + D'
  12. 12. R packages: testing • Test if your functions returns the expected results • Gives confidence in the correctness of your code, especially when changing things • http://r-pkgs.had.co.nz/tests.html devtools::use_testthat() library(stringr) context("String length") test_that("str_length is number of characters", { expect_equal(str_length("a"), 1) expect_equal(str_length("ab"), 2) expect_equal(str_length("abc"), 3) })
  13. 13. R Markdown • Easy creation of dynamic documents – Mix of R and markdown – Output to word, html or pdf – Integrates nicely with version control as markdown is a text format (easy to diff) • Rstudio: New file -> R Markdown • Powered by knitr (alternative to Sweave)
  14. 14. R Markdown: example --- title: "Numbers and their values" output: word_document: fig_caption: yes --- ```{r, echo=FALSE, warning=FALSE, message=FALSE} # R code block that won’t appear in the output document three <- 1+2 ``` # Chapter 1: On the value of 1 and 2 It is a well known fact that 1 and 2 = `r three`, you can calculate this also inline `r 1+2`. Or show the entire calculation: ```{r} 1+2 ```
  15. 15. Markdown basics Headers # Heading level 1 ## Heading level 2 ###### Heading level 6 *italic* and is _this is also italic_ **bold** and __this is also bold__ *, + or - for (unordered) list items (bullets) 1., 2., …. for ordered list This is an [example link](http://example.com/). Image here: ![alt text](/path/to/img.jpg) Bibtex references: [@RCoreTeam2014; @Wand2014] but needs a link to a bibtex file in the header bibliography: bibliography.bib More at: http://daringfireball.net/projects/markdown/basics Used at other places : github, stackoverflow, … but sometimes a dialect
  16. 16. Caching intermediate results Official way: http://yihui.name/knitr/demo/cache/ Hand rolled (more explicit, but doesn’t clean up previous versions and hard coded cache directory): library(digest) make_or_load <- function(change_path, file_prefix, make_fn, force_make = FALSE) { changeid <- as.integer(file.info(change_path)$mtime) fn_md5 <- digest(capture.output(make_fn), algo = "md5", serialize = F) path <- paste0("D:/temp/", file_prefix, changeid, "_", fn_md5, ".RData") if(!file.exists(path) || force_make) { result <- make_fn() save(result, file = path) } else { result <- get(load(path)) } return(result) } df <- make_or_load(wb, "invasives_df_area_", function() { set_area(df) })
  17. 17. Saving plots save_plot <- function(filename, plotfn, outdir = "D:/temp/", ...) { height<-498 width<-662 invisible(capture.output(tryCatch({ plotfn(...) op <- par(mar=c(2.2,4.1,1,1)+0.1) on.exit(op) jpeg(filename=paste0(outdir, filename ,".jpeg"), width=width, height=height, pointsize=12, quality=100) plotfn(...) dev.off() par(mar=c(5, 4, 4, 2) + 0.1) # default values svg(filename=paste0(outdir, filename,".svg"), width=14, height=7, pointsize=12,onefile=TRUE) plotfn(...) dev.off() }, error = function(e) { print(e) }, finally = { while(dev.cur() > 2) dev.off() }))) } set.seed(42) save_plot("plothist", hist, x=sample(c(1:5,3:4), 100, replace = TRUE), xlab = "Random", ylab = "Density", freq = FALSE, breaks=1:5)
  18. 18. Saving tables • As html stargazer(data, type = "html", summary = FALSE, out = outputpath , out.header = T) • As csv write.csv2(data, file = outputpath) data <- read.csv2(outputpath) • As Rdata save(data, file = outputpath) data <- load(outputpath)
  19. 19. Packrat Use packrat to make your R projects more: • Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library. • Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on. • Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
  20. 20. Packrat Rstudio: Project support for Packrat on creation of a project or it can be enabled in the project settings Manually: install.packages("packrat") # intialize packrat in an project directory packrat::init("D:/temp/demo_packrat") # install a package install.packages("raster") # save the changes in Packrat (by default auto-snapshot packrat::snapshot() # view list of packages that might be missing or that can be removed packrat::status()
  21. 21. DEMO • Package development (new, existing) • Rmarkdown (new, existing) • Packrat (new and existing project) – packrat::init()
  22. 22. Learning More https://software-carpentry.org/ Lessons on using the (Linux) shell, Git, Mercurial, Databases & SQL, Python, R, Matlab and automation with Make R packages by Hadley Wickham Advanced R by Hadley Wickham

×