Reproducible Project Workflow in R (with ProjectTemplate)

ProjectTemplate is an R package that makes it easy to create and run projects so that you can spend less time thinking about organization and more time analyzing data.

  1. 1. Reproducible Project Workflow in R (with ProjectTemplate) Caitlin Hudon, Sr. Data Analyst @ web.com
  2. 2. Why reproducible workflow matters
  3. 3. Things I care about (in structuring projects) + Easy to find things (reports, data, etc.) + Easy to share + Ability to update with new data + Reproducibility + Being able to quickly verify results + The sanity of my future self
  4. 4. Enter:
  5. 5. Technical Benefits of ProjectTemplate + Easy to create new projects + Organizes and standardizes your projects + Automatically loads data and R packages needed + Automatically runs data munging scripts
  6. 6. Non-technical benefits + Automates the thoughtless parts of your project (so you can use your energy on the important stuff) + Easier for someone new to your project to understand steps / data / preparation involved to execute it
  7. 7. Create a project library(‘ProjectTemplate’) create.project(‘new_project’)
  8. 8. New ProjectTemplate Project Munge Folder Data Folder Config Folder → global.dcf file
  9. 9. Load a project library(‘ProjectTemplate’) setwd(‘~/projects/new_project’) load.project()
  10. 10. Advice + Number files in the munge folder so they run in order + Avoid manually updating raw data files (do this in code!) + Edit the global.dcf file (in config folder) to add libraries you use, load libraries automatically, and adjust “strings as factors” settings + Lots of other ways to make analysis reproducible; ProjectTemplate is one part of larger ecosystem
  11. 11. Cheat Sheet ProjectTemplate home page + Don’t miss the tutorial! Great discussion on best practices for managing analysis projects The talk that got me interested in ProjectTemplate (thanks, Hilary Parker!)
  12. 12. Questions? Ideas? caitlinmhudon@gmail.com @beeonaposy