Adam Ralph from the Irish Centre for High End Computing presented this Introduction to Basic R during the Big Data Workshop hosted by the Social Sciences Computing Hub at the Whitaker Institute on the 14th November 2013
3. Introduction, What and Why?
R was created by Ross Ihaka and Robert Gentleman (R&R) with the
first stable version (1.0) released in early 2000.
R is a leading tool for statistics, data manipulation and modeling, and
graphic creation.
Platform-independent, free, and open-source programming language.
Revolution Analytics provide commercial support.
Integration with other languages such as C/C++, Java, Python.
Access to various data sources e.g. Excel, SAS, SPSS, Minitab, etc.
Large, active, and growing community of users.
Existing platform for communication e.g. UserR conference, local
UserR community.
BasicR 3
5. On-line Material
Download the R program or source code from CRAN (Comprehensive
R Archive Network) at http://cran.r-project.org/.
Latest version, 2.15.0 (2012–03–30), generally updated by every 6
months.
R comes with a certain amount of capability built-in. Optional
packages provide additional functionality for R.
BasicR 5
6. Functions
The functionality of R is provided by functions.
Each function does a specific task or operation. There maybe more
than one function to perform the same operation.
Functions have a set of inputs, which creates a certain flexibility and
the results can be saved.
Thus when using R several steps may need to be taken (using
different functions) to achieve a certain goal.
LinearRegressionModel <- lm(weight ˜ group)
plot(LinearRegressionModel)
BasicR 6
7. Packages
Installing packages increases the functionality of our local version of
R.
Packages are self contained but may have dependencies.
Each package provides a set of functions and documentation related
to a particular task.
The package approach means that you can tailor R to suit your needs
and minimize compute resources.
It also allows you to define your own workflows.
BasicR 7
8. Help in R
Built in Help
help with a single function, e.g. “plot”
help(plot) <1>
? plot
help with a specific topic, e.g. “regression”
help.search("regression") <2>
??regression
search for functions whose names contains for example “acf”
apropos("acf") <3>
[1] "acf" "acf2AR" "ARMAacf" "pacf"
BasicR 8
9. Help in R
On-line Help
Top level help web page for R.
help.start() <4>
R-help mailing list, see http://www.R-project.org/mail.html for
an overview of the mailing lists.
Numerous levels
◮ r-announce@r-project.org, R releases.
◮ r-packages@r-project.org, R package updates.
◮ r-help@r-project.org, main users forum.
◮ r-devel@r-project.org, developers forum.
Archived online
◮ https://stat.ethz.ch/pipermail/r-help/
◮ http://finzi.psych.upenn.edu/
◮ http://tolstoy.newcastle.edu.au/R/
BasicR 9
10. Inbuilt Datasets
airquality
There are a number of example datasets which come as default within
R, in the datasets package.
Display the information on the specified package.
?datasets
# or
library(help="datasets")
Load the airquality dataset using
data(airquality)
Display information on the specified data set
?airquality
Summarize data columns in data frame.
summary(airquality) <11>
BasicR 10
11. R-Forge
R-Forge: another possible place to look for packages.
◮ It provides tools and platform for developers to collaborate.
◮ Place to maintain current and historical versions of files functioning as
a version control system.
Caveat: The R-Forge site contains projects that are in progress, so
please be sure to read the disclaimers and documentations before use.
BasicR 11
12. R Studio
The free IDE (Integrated Development Environment) for R.
Limited syntax highlighting scheme.
Isn’t compatible with some API tool, e.g. google visualization API.
BasicR 12
13. Visualization
R has the typical 2D statistical plots:
1. Histograms,
2. Bar charts,
3. Pie charts,
4. Scatter plots,
5. and box plots.
BasicR 13
14. Graphical Devices
Typically when plotting a separate window pops up.
Plot can be saved to file using ”Save as” menu item.
Plots can be saved directly to a file, as PDF WMF (Windows Meta
File), PNG and JPG.
BasicR 14
15. 2D Plots
Histogram of Forbes2000$marketvalue
Forbes2000$marketvalue
Frequency
0 50 100 150 200 250 300 350
05001500
Histogram of log(Forbes2000$marketvalue)
log(Forbes2000$marketvalue)
Frequency
−4 −2 0 2 4 6
0400800
Aerospace & defense Food markets Media Trading companies
050100150200250300
BasicR 15