n this talk, I will take the preparation of tutorial program in data science conference 2014 as an example and share some experience of R, Git, Github and CI(jenkins). The rating of our tutorial exceeds 4.2 (1 ~ 5). Some speakers and assistants agree that the assistant package "DSC2014Tutorial" improves the preparation and the teaching. Therefore, I would like to share the experience to establish such an working environment.
1. R, Git, Github, and
CI
TTaaiiwwaann RR UUsseerr GGrroouupp
WWuusshh WWuu
22001144--0099--2200
2. DSC 2014
● 2014 is the first year of DSC(Data Science
Conference) in Taiwan.
● We (Taiwan R User Group) organizes the Tutorial
Program of R in DSC.
● There were more than 100 students joined us during
DSC 2014.
● The averaged rating is more than 4.2 (1 ~ 5).
3. Goal of Tutorial
● Systematically introduce the analysis step with R
– Basic
– Data Manipulation(Extract, Transform and Loading)
– Analysis
– Visualization
● Based on the latest tools of R
● Reproducibility of examples
● Integration of materials
● *Well designed exercises
4. About Me
● PhD Candidate in NTU EE
● Current research field:
– Online Advertisement
– Large Scale Predictive Modeling
● Organizer of Taiwan R User Group
● Organizer of Tutorial Program in DSC 2014
5. Outline
● Share the experience of organizing tutorial program
with 16 people with:
– Git, my favorite tool of version control
– Github, a platform of cooperation
– Jenkins, a system of automation
● I will show how to cooperate these tools with R
package
6. Why R Package
● There are many dependency for examples and exercises
● R package is the recommended way to share your code
● Wrap all materials in one R Package: DSC2014Tutorial so the
students only need to download once.
– All slides are included.
– Customized R API
– All data
– *Installation of depended packages
– Solving issue of portability(Windows, Mac, and Ubuntu)
● The package is easily managed by git and released on github
7. The structure of R package
Dependencies
● DESCRIPTION
Package: DSC2014Tutorial
Type: Package
Title: Materials of Tutorial Program on
DSC 2014
Version: 1.2
Date: 2014-08-03
Author: Taiwan R User Group
Maintainer: Wush Wu <wush978@gmail.com>
Description: This package contains the
required materials of R Tutorial
DSC2014
License: GPL (>= 3)
Depends:
R (>= 3.1.0)
Imports:
tools,
...
8. The structure of R package
Data
● data
data(salary, package = 'DSC2014Tutorial')
9. The structure of R package
cross-platform
● configure.ac / configure
10. The structure of R package
slides and external source
system.file('Basic', package =
'DSC2014Tutorial')
11. Git, Version Control
● Some speakers are new to git
● We used the following feature:
– Self version control: add, commit
– Repository: remote, push, pull, and merge
– Cooperation: submodul
● Git plays the fundamental role in our workflow
12. Why Git?
● Speed is king
● Local commits rock
● Github
●My favorite
13. Github
● Most popular platform for managing git
repository
● Provide many convenient features
– Account of Organization
– Designed for cooperation
– Simple integration with many popular CI tools
– Static website (Sufficient for R Repository)
14. Release R Package on Github
● R is released as:
– a git repository
– a R repository
15. Github and R Repository
● How to establish a R repository on github:
1.Create a new git repository named 『R』
2.Add the content of R repository into git repository in
branch gghh--ppaaggeess
3. Push and wait
4. The R Repository is located at http://<account>.github.io/R
● The user could install the binary of DSC2014Tutorial
directly via
install.packages(DSC2014Tutorial, repos =
"http://TaiwanRUserGroup.github.io/R")
16. Cooperation
● I cannot build all slides of tutorial
– There are 7 slides built from different groups of speakers
● Each slides should be managed by its author
– Each slides is a standalone git repository
– No branching here because not all speakers are familiear with
git
● Use gitsubmodule to embed these slides into R Package
● We need modern work flow to control the quality
17. Workflow 1
1.Each speakers creates the slides and initialize the git
repository
2.Speakers commit their changes to git repository
3.Open the pull request
4.Slide review and test on different platform
5.Merge changes to DSC2014Tutorial
22. Slide Review
● Each speakers review the slides of each others
● The comment are posted to Issue of the github pages
● The speaker should resolve the posted issue
24. Challenge
● After the first rehearsal on Taiwan R User Group,
we notice a serious encoding issue
– Default chinese encoding is different
25. Challenge
● We could resolve the specific issue
● The slides are evolving, some bugs might occur
● We need to test the slides, but there are 7 slides and
we want to test them on Windows, ubuntu and mac*
26. Why CI
● CI automates the following things
– Testing
– Integration
– Deployment
● CI makes me a better life
● CI also introduces some problems. Let's discuss it
later.
27. Test R Package
● R CMD check --no-codoc --no-manual --no-vignettes
–no-build-vignettes
28. Deploy R Package
● git push
● Commit to R Repository
tools::write_PACKAGES( type = c("source",
"mac.binary", "win.binary") )