The statistical language R is a free, open-source tool that is changing the way many market researchers approach data analysis and visualization. In this webinar, Ray Poynter explains: what is R? Why you might want to use it? Ray, provides some examples using R that you can try yourself.
1. A simple introduction to R
for market researchers
October
2019
Webinar Friday 4 October
Live broadcast 10am New York (3pm London)
Ray Poynter
Chief Research Officer, Potentiate
2. What is R?
• An open-source, free statistical language
• The core language is expanded by an enormous collection of
libraries
• Available for Windows, Mac and UNIX
• Find out about R (& download it) from: -
https://www.r-project.org/
• Learning R
• Books
• Articles
• Videos
• E-learning, e.g. DataCamp, Udemy & Coursera
3. What is RStudio?
• There are other choices, but nearly everybody I know
is using RStudio to work with R
• It is an IDE (Integrated Development Environment)
• Editor
• A tidy place to run R, to see the variables, and keep things tidy
• There are open-source and commercial options (free and
not-free)
• Find out more and download it from https://rstudio.com/
5. Commands and R – Hello World
> print("Hello World")
[1] "Hello World"
>
> print("Hello World", quote=FALSE)
[1] Hello World
>
> myText <- "Hello World"
> print(myText)
[1] "Hello World"
>
> myText
[1] "Hello World"
>
6. Commands and R – Variables
> a <- 2
> b <- 4
> print(a * b)
[1] 8
>
> c <- a * b
> c
[1] 8
>
> c <- "Hello World"
> c
[1] "Hello World"
>
7. Commands and R – Vectors
> x <- c(1,2,3,4)
> x
[1] 1 2 3 4
> y <- 2 * x
> y
[1] 2 4 6 8
> z <- c("One","Two","Three")
> z
[1] "One" "Two" "Three"
8. Commands and R – Data sets
R has lots of built in data sets. For example, mtcars
> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg :Class 'labelled' num 21 21 22.8 21.4 18.7 18.1 14.3
24.4 22.8 19.2 ...
.. .. LABEL: Miles/(US) gallon
$ cyl :Class 'labelled' num 6 6 4 6 8 6 8 4 4 6 ...
.. .. LABEL: Number of cylinders
And 9 more variables
9. mtcars
Use help (or ?) to understand an included data set
> ?mtcars
mtcars {datasets} R Documentation
Motor Trend Car Road Tests
Description
The data was extracted from the 1974 Motor Trend US magazine, and
comprises fuel consumption and 10 aspects of automobile design and
performance for 32 automobiles (1973–74 models).
Usage
Mtcars
Format
A data frame with 32 observations on 11 (numeric) variables.
10. Commands and R – Data Frames
Data Frame – a compound structure,
where the rows can be different sorts of items.
> head(mtcars,4)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
11. Commands and R – Frames and Vectors
We can address vectors from inside a data frame using $
> summary(mtcars$mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90
>
13. Commands and R - libraries
The real power of R comes from the installed libraries
> install.packages("ggplot2")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/ggplot2_3.2.1.tgz'
Content type 'application/x-gzip' length 3973186 bytes (3.8 MB)
==================================================
downloaded 3.8 MB
The downloaded binary packages are in
/var/folders/wp/n9tjrcps0990gpznfmpqff2h0000gn/T//RtmpXDhlrV/downloaded_packages
> library(ggplot2)
>
> ggplot(mtcars, aes(x=hp, y=mpg)) +
geom_point(aes(shape=factor(cyl), colour=factor(cyl))) +
xlab("Performance (horse power)") +
ylab("Fuel consumption (mpg)")+
ggtitle("More cylinders are associated with fewer miles per gallon") +
scale_shape_discrete(name="Cylinders") +
scale_colour_discrete(name="Cylinders")
15. Scripts and R
Scripts are the best way to use R
• You create a record of what you did
• You can tweak the code
• You can audit the code
• You can re-use the code for other projects
• Comment your code to make it readable
16. Scripts and R
Console
From the Script
1. Run the whole script
2. Select and run a section
3. Run a single line
The code and the results appear in
the Console
19. The Iris Data Set
> data(iris)
> ?iris
Edgar Anderson's Iris Data
Description
This famous (Fisher's or Anderson's) iris data set gives the measurements in
centimeters of the variables sepal length and width and petal length and width,
respectively, for 50 flowers from each of 3 species of iris. The species are Iris
setosa, versicolor, and virginica.
iris is a data frame with 150 cases (rows) and 5 variables (columns)
named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.
28. Overview
• Free and open-source
• Massive collection of libraries
• Stats
• Text analytics
• AI tools
• Graphics
• Relatively steep learning curve
• More about finding the story than telling the story
• Lots of resources for learning about R
• Books, videos, courses, papers etc
29. Q & A
Ray Poynter
Chief Research Officer
Potentiate
October
2019