A relatively short Introduction to R as presented at the Belgian Software Craftmanship meetup group.
The goal of this presentation is to give you an introduction to:
• The style of the language
• It's ecosystem
• How common things like data manipulation and visualization work
• How to use it for machine learning
• Webdevelopment and report generation in R
• Integrating R in your system
License:
Introduction To R by Samuel Bosch
To the extent possible under law, the person who associated CC0 with Introduction To R has waived all copyright and related or neighboring rights
to Introduction To R.
http://creativecommons.org/publicdomain/zero/1.0/
2. What is R
R is a language and environment for statistical computing and graphics. It
is a GNU project which is similar to the S language.
Created in 1993, license: GNU GPL, current version 3.2.3
Interpreted
C-like syntax
Functional programming language semantics (Lisp, APL)
Object oriented (3 different OO systems)
Garbage collector
Mostly call-by-value
Lexical scope
Function closure
·
·
·
·
·
·
·
·
·
/
4. Usage
CRAN Task Views: https://cran.r-project.org/web/views/
Statistics (frequentist and bayesian)
Machine learning and data mining
Science (mathematics, chemistry, physics, medical, ecology, genetics,
economy, history, …)
Finance
Natural Language Processing
Data visualization
Analyzing spatial, spatio-temporal data and time series
…
·
·
·
·
·
·
·
·
/
5. R Markdown
This is an R Markdown presentation. Markdown is a simple formatting
syntax for authoring HTML, PDF, and MS Word documents. For more
details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that
includes both content as well as the output of any embedded R code
chunks within the document.
/
6. Competitors/colleagues
SAS, SPSS, STATA, Mathematica and other statistical software
Python + Numpy + Pandas + matplotlib + …
Matlab/Octave
Julia
K/J and other APL like languages
Java (Weka), Clojure, .NET (F#), …
·
·
·
·
·
·
/
7. Calling R
command line
SAS, SPSS, Stata, Statistica, JMP
Java, C++, F#
Python, Perl, Ruby, Julia
PostgreSQL: PL/R
·
·
·
·
·
/
8. Ecosystem
IDE: RStudio or one of the alternatives (plugins for Eclipse, Visual
Studio, Atom, Sublime Text, Vim, …) Packages: CRAN (6700+ packages),
Bioconductor, RForge, Github
Learning more and getting help:
Built-in documentation (?, help(), F1) and package vignettes
Official manuals: https://cran.r-project.org/manuals.html
Short reference card: https://cran.r-project.org/doc/contrib/Short-
refcard.pdf
(Free) books: Advanced R and R packages by Hadley Wickham
Courses on Edx and Coursera
Stack Overflow and Cross validated (for statistical questions)
mail@samuelbosch.com
·
·
·
·
·
·
·
/
11. Vectors
List of elements of the same type
a <‐ c(1,2,5.3,6,‐2,4) # numeric vector
a[c(2,4)] # 2nd and 4th element
## [1] 2 6
names(a) <‐ c("c","d","e","f","g","h")
a
## c d e f g h
## 1.0 2.0 5.3 6.0 ‐2.0 4.0
/
15. Data Types: numeric vectors
Default type for numbers
class(c(1, 2.3))
## [1] "numeric"
c(is.integer(1), is.numeric(1))
## [1] FALSE TRUE
c(seq(from = 1, to = 5, by = 2), rep(c(6,7), times = c(2,3)))
## [1] 1 3 5 6 6 7 7 7
/
16. Data Types: integer vectors
as.integer(c(1,2.3,"4.5","bla"))
## Warning: NAs introduced by coercion
## [1] 1 2 4 NA
as.integer(c(TRUE,FALSE))
## [1] 1 0
/
17. Factors
Used to encode a vector as a factor ('category'/'enumerated type')
f <‐ factor(c(1,1,2,2,3,3,2,1), levels=c(1,2,3), labels=c("a", "b", "c"))
f
## [1] a a b b c c b a
## Levels: a b c
table(f)
## f
## a b c
## 3 3 2
/
24. Arrays
One, two or more dimensions
a <‐ array(data = t(1:24), dim = c(2,3,4))
a[1,,]
## [,1] [,2] [,3] [,4]
## [1,] 1 7 13 19
## [2,] 3 9 15 21
## [3,] 5 11 17 23
a[1,1,1]
## [1] 1
/
25. Data frames
A data frame combines columns with the same length and different
data types
d <‐ data.frame(number=1:2, bool=c(TRUE, FALSE), string=c("y", "z"))
d$number
## [1] 1 2
d[1,c(2,3)]
## bool string
## 1 TRUE y
/
27. dplyr
Lots of operators for manipulating local and database data (sqlite,
mysql and postgresql). Basic verbs:
Other goodies:
select
filter
arrange (= sort)
mutate
summarise
·
·
·
·
·
piping (chaining)
database access as lazy as possible
Bigquery support (Google)
·
·
· /
42. Objects
Recommended reading: http://adv-r.had.co.nz/OO-essentials.html
S3: generic function OO, very casual system e.g. drawRect(canvas,
"blue")
S4: similar to S3 but more rigid, has multiple dispatch
Reference classes: message-passing OO (like Java, C++, etc), objects
are mutable
Base classes: defined in C
·
·
·
·
/
46. Package development
devtools + roxygen2 + testthat
Advantages:
Disadvantage:
Get started with the book http://r-pkgs.had.co.nz/ by Hadley Wickham
testing
documentation
versioning
distribution
·
·
·
·
more work·
/
50. Web
Shiny: http://shiny.rstudio.com/
OpenCPU: https://www.opencpu.org/
RServe: https://rforge.net/Rserve/doc.html
·
interactive web pages
no need for javascript (at least not for simple things)
reactive programming
typically ui.R and a server.R
example: http://shiny.rstudio.com/gallery/movie-explorer.html
DEMO
-
-
-
-
-
-
·
HTTP API for data analysis in R-
·
Binary R server-
/