SlideShare a Scribd company logo
1 of 37
Graphing with R & ggplot2
       Week 1: Introduction to R


 Chicago Data Visualization Group
           Workshop
         October 8, 2012
Agenda
•   Introductions
•   Survey: http://j.mp/ggplot2-2012.
•   Overview
•   Workshop




                     Data Objects: Factors   2
Workshop Overview
Introduction to R (Week 1)
We will familiarized ourselves with the R environment with a gentle introduction to the basic functions.
After installing R, we will import and inspect data sets while becoming familiar with R terminology. By
the end of the class, we will conduct basic descriptions and plots of the data.
Introduction to ggplot2 (Week 2)
We will begin to use the ggplot2 package to create basic, but handsome, univariate, bivariate, and time-
series graphs. We will introduce the functions and terminology used in ggplot2. We will also explain the
fundamentals of proper data visualization techniques and how it relates to the ggplot2 defaults.
Grammar of Graphics (Week 3)
We will continue to show more advanced features of ggplot2, including how it relates to Leland
Wilkinson's Grammar of Graphics. We will show how to plot more than 2 variables in a single graph
using colors, shapes, and sizes. We will also discuss how human ability to perceive different shapes and
colors should drive the choices we make in data visualization.
Plots for Publications (Week 4)
After learning how to make plots, we will learn how to customize graphs with custom colors, labels, and
themes. We will emphasize how to create a customized look to be included in publications, including
addings labels in diagrams to help readers.



                                           Data Objects: Factors                                       3
Installing R
1.   Go to http://cran.r-project.org/
2.   Download the installation for your OS.
3.   Follow instructions for installation.
4.   Start R from menu.




                                Data Objects: Factors   4
The R Console
• R can execute scripts or respond interactively at the console.
   > 5+4                                           # addition
   [1] 9
   > 2^10                                          # exponentiation
   [1] 1024
   > a=2; b=10                                     # two commands on same line
   > a^b
   [1] 1024
   >ls()                                           # list of existing objects
   > x = rnorm(100); y = rnorm(100)                # generate random distributions
   > length(x)                                     # get the length of object
   [1] 100
   > mean(x)                                       # calculate the mean
   [1] 0.07163738
   > sd(x)                                         # calculate the std.dev
   [1] 1.086229
   > plot(x, y)                                    # create a simple plot


                                Data Objects: Factors                                5
The R Working Directory
• R works in the context of a directory. This is usually the directory from
  where R was started.
• Get and Set your working directory.
   >   getwd() # Determine your working directory
   >   setwd(“your directory") # set to your data directory location
   >   getwd() # check that you are in the correct directory
   >   dir() # list file names in the current directory




                                 Data Objects: Factors                        6
Getting Help
• At the console
     >   help.start()             # general help page
     >   Help(functionname)       # help on function
     >   ?(functionname)
     >   ??(search string)        # find all references to search string
     >   Example(topic)           # see an example of topic
     >   Demo()                   # see demos




                              Data Objects: Factors                        7
Data Types & Data Objects
• Data objects
      –   Vector: a set of elements of the same type.
      –   Matrix: a set of elements in rows & columns of the same type.
      –   data frame: rows & columns of elements of different types.
      –   Lists & Arrays*

• Data types (aka mode) of data objects
      –   Numeric: 3.14 and 3,4,5,….
      –   Character: “abc”, “cat”, “dog”,…
      –   Logical: TRUE, FALSE, NA
      –   Complex and Raw*

* Out of scope for this presentation


                                       Data Objects: Factors              8
Data Types: Numeric
• Decimal values are numeric in R.
   > x = 10.5        # assign a decimal value
   > x               # print the value of x
   [1] 10.5
   > class(x)        # print the class name of x
   [1] "numeric“

• Integer values are saved as numeric.
   > k = 1
   > k               # print the value of k
   [1] 1
   > class(k)        # print the class name of k
   [1] "numeric“

• The fact that k is not an integer can be confirmed with
  the is.integer().
   > is.integer(k)   # is k an integer?
   [1] FALSE


                               Data Objects: Factors        9
Data Types: Integer
• Create an integer with the as.integer().
   > y = as.integer(3)
   > y              # print the value of y
   [1] 3
   > class(y)       # print the class name of y
   [1] "integer"
   > is.integer(y) # is y an integer?
   [1] TRUE

• Coerce a numeric value into an integer with the
  same as.integer().
   > as.integer(3.14)     # coerce a numeric value
   [1] 3

• Parse a string for decimal values in much the same way.
   > as.integer("5.27")   # coerce a decimal string
   [1] 5



                             Data Objects: Factors          10
Data Type: Logical
• A logical value is often created via comparison between
  variables.
   > x = 1; y = 2   # sample values
   > z = x > y      # is x larger than y?
   > z              # print the logical value
   [1] FALSE
   > class(z)       # print the class name of z
   [1] "logical"

• Standard logical operations: "&", "|", and "!".
   > u = TRUE; v = FALSE
   > u & v          # u AND v
   [1] FALSE
   > u | v          # u OR v
   [1] TRUE
   > !u             # negation of u
   [1] FALSE



                            Data Objects: Factors           11
Data Type: Character
• A character data type is used to represent string values in R.
    > x = as.character(3.14)
    > x              # print the character string
    [1] "3.14"
    > class(x)       # print the class name of x
    [1] "character"
•   Multiple character values can be concatenated with the paste()or
    sprintf().
    > name = "Joe"; amount = 100
    > paste(name, "has", amount, "dollars")                   Both expressions produce
    > sprintf("%s has %d dollars", name, amount)              the same result
    [1] "Joe has 100 dollars“
•   To extract a substring, we apply the substr().
    > substr("Mary has a little lamb.", start=3, stop=12)
    [1] "ry has a l"
•   Replace strings with the sub().
    > sub("little", "big", "Mary has a little lamb.")
    [1] "Mary has a big lamb."

                                      Data Objects: Factors                              12
Data Objects: Vectors
• A vector is a sequence of data elements of the same basic type.
• Here is a numeric vector created by the concatenation function c().
   > c(2, 3, 5)
   [1] 2 3 5
• Vectors can be combined via the function c.
   > n = c(2, 3, 5)
   > s = c("aa", "bb", "cc", "dd", "ee")
   > c(n, s)
   [1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee“
    – Notice the numeric values are coerced into character strings.




                                  Data Objects: Factors                 13
Vectors: Sequences & Patterns
• Sequences & patterns in vectors can be created with seq() & rep()
   > seq(1,9,by=2)
   [1] 1 3 5 7 9
   > seq(8,20,length=6)
   [1] 8.0 10.4 12.8 15.2 17.6 20.0
   > rep(1:3,6)
   [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
   > rep(1:3,rep(6,3))
   [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3
• Ranges can be created with the : operator
   > 1:5
   [1] 1 2 3 4 5




                               Data Objects: Factors                  14
Vectors: Accessing elements
• Access vectors values with [i] operator.
   > s = c("aa", "bb", "cc", "dd", "ee")
   > s[3]
   [1] "cc"
• Other methods of accessing vector values by index
   > s[-3]                        # Negative index removes value.
   [1] "aa" "bb" "dd" "ee"
   > s[c(2, 3)]                   # Numeric index
   [1] "bb" "cc"
   > s[c(2, 3, 3)]                # Duplicate indexes
   [1] "bb" "cc" "cc"
   > s[c(2, 1, 3)]                # Out of order indexes
   [1] "bb" "aa" "cc"
   > s[2:4]                       # Range index
   [1] "bb" "cc" "dd“
   > s[c(FALSE, TRUE, FALSE, TRUE, FALSE)] # Logical index
   [1] "bb" "dd"



                                Data Objects: Factors               15
Vectors: Arithmetic
• Arithmetic operations of vectors are performed member-by-member.
• For example, suppose we have two vectors a and b.
   > a = c(1, 3, 5, 7); b = c(1, 2, 4, 8)
• Examples of arithmetic operations.
   > 5 * a      # Multiplication of a by 5.
   [1] 5 15 25 35
   > a + b      # Addition of a & b vectors
   [1] 2 5 9 15
   > a * b      # Multiplication of a & b
   [1] 1 6 20 56
   # Division and subtraction are also possible.
• If two vectors are of unequal length, the shorter one will be recycled in
  order to match the longer vector.
   > u = c(10, 20, 30); v = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
   > u + v
   [1] 11 22 33 14 25 36 17 28 39


                                 Data Objects: Factors                        16
Vectors: Subsetting and Summary
• Let's suppose we've collected some data from an experiment
   > x=c(7.5,8.2,3.1,5.6,8.2,9.3,6.5,7.0,9.3,1.2,14.5,6.2)
• Some simple summary statistics of these data can be produced:
   > mean(x)
   [1] 7.216667
   > summary(x)
   Min. 1st Qu. Median Mean 3rd Qu. Max.
   1.200 6.050 7.250 7.217 8.475 14.500
• Suppose we want summaries of two extracts of this data.
   > summary(x[1:6])
   Min. 1st Qu. Median Mean 3rd Qu. Max.
   3.100 6.075 7.850 6.983 8.200 9.300
   > summary(x[7:12])
   Min. 1st Qu. Median Mean 3rd Qu. Max.
   1.200 6.275 6.750 7.450 8.725 14.500




                              Data Objects: Factors               17
Exercises
1. Define                                          (c) seq(4,10,by = 2)
                                                   (d) seq(3,30,length = 10)
   > x = c(4,2,6)
                                                   (e) seq(6,-4,by = -2)
   > y = c(1,0,-1)
   Decide what the result will be of         3. Determine these patterns.
                                                   (a)   rep(2,4)
   the following:                                  (b)   rep(c(1,2),4)
   (a)   length(x)
                                                   (c)   rep(c(1,2),c(4,4))
   (b)   sum(x)
                                                   (d)   rep(1:4,4)
   (c)   sum(x^2)
                                                   (e)   rep(1:4,rep(3,4))
   (d)   x+y
   (e)   x*y                                 4. Use the rep()to define the
   (f)   x-2
                                             following vectors.
   (g)   x^2
                                                   (a) 6,6,6,6,6,6
                                                   (b) 5,8,5,8,5,8,5,8
2. Determine these sequences.                      (c) 5,5,5,5,8,8,8,8
   (a) 7:11
   (b) seq(2,9)


                                Data Objects: Factors                          18
Exercises
5. If x =c(5,9,2,3,4,6,7,0,8,12,2,9) determine                         the following.
     (a) x[2]
     (b) x[2:4]
     (c) x[c(2,3,6)]
     (d) x[c(1:5,10:12)]
     (e) x[-(10:12)]

Exercises taken from R – A Self Learn Tutorial from the National Center for Ecological Analysis and Synthesis




                                               Data Objects: Factors                                            19
Data Objects: Matrices
• A matrix is a collection of data elements of the same type arranged in a
  two-dimensional rectangular layout.
• Matrices can be created in R in a variety of ways. Perhaps the simplest is
  to create the columns and then glue them together with the command
  cbind.
   > x=c(5,7,9)
   > y=c(6,3,4)
   > z=cbind(x,y)
   > z
        x y
   [1,] 5 6
   [2,] 7 3
   [3,] 9 4
   > dim(z) # Get the dimensions of the matrix
   [1] 3 2
• Execute the expression rbind(x,y)and observe the result.

                                Data Objects: Factors                          20
Matrices: Creating
• Using the matrix() to create matrices.
   > matrix(c(2, 4, 3, 1, 5, 7),nrow=2,ncol=3,byrow = TRUE)
        [,1] [,2] [,3]
   [1,]    2    4    3
   [2,]    1    5    7
• Removing, or changing, the values of nrow, ncol, & byrow will affect
  the shape of the matrix. Experiment with this to see how the shape
  changes.
• The dim() can also be used to create a matrix from a vector.
   > a = c(5,10,15,20,25,30)
   > dim(a) = c(2,3) # Reshape “a” by assigning it dimensions
   > a
        [,1] [,2] [,3]
   [1,]    5   15   25
   [2,]   10   20   30
• Transpose matrices with the t().
                                Data Objects: Factors                    21
Matrices: Accessing Elements
• An element from our matrix “a” can be accessed by with [].
   > a[2,3] # access the element in the second row, third column
   [1] 30
   > a[2,] # access the entire second row
   [1] 10 20 30
   > a[,3] # access the entire third column
   [1] 25 30
   > a[ ,c(1,3)] # access the first and third column
        [,1] [,2]
   [1,]    5   25
   [2,]   10   30
• Rows and columns can be named.
   > dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))
   > a[ ,c("col 1","col 3")]
         col 1 col 3
   row 1     5    25
   row 2    10    30

                              Data Objects: Factors                      22
Matrices: Arithmetic
> ## The Matrices                                      > ## Matrix Math
> z =matrix(c(5,7,9,6,3,4),nrow=3,byrow=T)             > y + z
> z # a 3 x 2 matrix                                        [,1] [,2]
     [,1] [,2]                                         [1,]    6   10
[1,]    5     7                                        [2,]    9   15
[2,]    9     6                                        [3,]    8    3
[3,]    3     4                                        > y * z
> y=matrix(c(1,3,0,9,5,-1),nrow=3,byrow=T)                  [,1] [,2]
> y # a 3 x 2 matrix                                   [1,]    5   21
     [,1] [,2]                                         [2,]    0   54
[1,]    1     3                                        [3,]   15   -4
[2,]    0     9                                        > y%*%x
[3,]    5   -1                                              [,1] [,2]
> x=matrix(c(3,4,-2,6),nrow=2,byrow=T)                 [1,]   -3   22
> x # a 2 x 2                                          [2,] -18    54
     [,1] [,2]                                         [3,]   17   14
[1,]    3     4                                        •  Link to matrix
[2,]   -2    6                                            multiplication tutorial.

                               Data Objects: Factors                                 23
Exercises




 Data Objects: Factors   24
Data Objects: Data Frames
• A data frame is the most common data object in R and is used for storing
  data tables. It is a list of vectors of equal length.
   > n = c(2,3,5); s = c("aa","bb","cc"); b = c(TRUE,FALSE,TRUE)
   > data.frame(n,s,b) # output the data frame
     n s      b
   1 2 aa TRUE
   2 3 bb FALSE
   3 5 cc TRUE
• Data frames can contain columns of data from different types.
• Import data using read.table() and read.csv(). Both return data
  frames.




                               Data Objects: Factors                         25
Data Frames: Importing data
• Get the following data file and save to working directory.
    – http://www.ats.ucla.edu/stat/R/notes/hs0.csv
• Import with read.table()
   > schdat = read.table("hs0.csv", header=T, sep=",")
   > names(schdat)
   [1] "gender" "id"       "race"    "ses"     "schtyp" "prgtype" "read"
   "write"   "math"    "science" "socst"
   > head(schdat)
     gender id race ses schtyp prgtype read write math science socst
   1      0 70     4   1      1 general    57    52   41     47    57
   2      1 121    4   2      1   vocati   68    59   53     63    61
   3      0 86     4   3      1 general    44    33   54     58    31
   4      0 141    4   3      1   vocati   63    44   47     53    56
   5      0 172    4   2      1 academic   47    52   57     53    61
   6      0 113    4   2      1 academic   44    52   51     63    61




                                    Data Objects: Factors                  26
Data Frames: Subsetting
• Retrieving a column vector using [[]] or $ or [,]
    – These return the same vector: schdat[[3]];schdat$race;schdat[,”race”]
    > str(schdat$race) #get the structure of schdat$race
      int [1:200] 4 4 4 4 4 4 3 1 4 3 ...
• Retrieving a data frame column slice using []
    – The return the same data frame: schdat[3];schdat[“Race”]
    > str(schdat["race"]) #get the structure of schdat[“race”]
     'data.frame':   200 obs. of 1 variable:
     $ race: int 4 4 4 4 4 4 3 1 4 3 ...
• Retrieving a data frame row slice using []
    – The return the same data frame: schdat[3];schdat[“Race”]
    > schdat[5,]
      gender id race ses schtyp prgtype read write math science socst
    5      0 172   4   2      1 academic  47    52   57      53    61
    – Run str(schdat[5,]) to see the structure of this data frame.


                                        Data Objects: Factors                 27
Data Frames: Subsetting
• We can use the subset()to slice both columns and rows. Let’s extract
  only the read, write, math, science scores for the “academic” schools.
   > schdat.academic = subset(schdat, prgtype=="academic",
   + select=c("read","write","math","science"))
   > head(schdat.academic)
      read write math science
   5    47    52   57      53
   6    44    52   51      63
   8    34    46   45      39
   10   57    55   52      50
   12   57    65   51      63
   13   73    60   71      61




                               Data Objects: Factors                       28
Data Frames: Exploring data
• Let’s subset the read, write, math, and science scores for analysis
   > read.sci = schdat[ , c("read","write","math","science")]
   > summary(read.sci) # get a 5 number summary
         read           write            math          science
    Min.   :28.00   Min.   :31.00   Min.   :33.00   Min.   :26.00
    1st Qu.:44.00   1st Qu.:45.75   1st Qu.:45.00   1st Qu.:44.00
    Median :50.00   Median :54.00   Median :52.00   Median :53.00
    Mean   :52.23   Mean   :52.77   Mean   :52.65   Mean   :51.66
    3rd Qu.:60.00   3rd Qu.:60.00   3rd Qu.:59.00   3rd Qu.:58.00
    Max.   :76.00   Max.   :67.00   Max.   :75.00   Max.   :74.00
                                                    NA's   :5




                                Data Objects: Factors                   29
Data Frames: Further Analysis
• Let’s look at additional statistics in the school data set
    > attach(schdat) # allow access to data by variable name only.
    > options(digits=2) # set significant digits
    > m = tapply(write,prgtype,mean) # tapply() calculates for every row
    > v = tapply(write,prgtype,var)
    > med = tapply(write,prgtype,median)
    > n = tapply(write,prgtype,length)
    > sd = tapply(write,prgtype,sd)
    > cbind(mean=m,var=v,std.dev=sd,median=med,n=n)
             mean var std.dev median   n
    academic   56 63      7.9     59 105
    general    51 88      9.4     54 45
    vocati     47 87      9.3     46 50
    > options(digits=7)




                                  Data Objects: Factors                    30
Data Frames: Graphing
• Exploring data through graphs. Execute the following and examine the
  output.
   >   hist(write)
   >   library(lattice) # load trellis graphics
   >   histogram(~write, schdat, type="count") # trellis graphs
   >   histogram(~write | gender, schdat, type="count") # histogram
   >   hist(write, breaks=15) # change the number of bins to 15
   >   boxplot(write) # boxplot function in the graphics package
   >   bwplot(ses~ write| gender, schdat) # boxplot by gender
   >   barplot(table(ses, gender), legend=c("low", "medium", "high"))
   >   barplot(table(ses, gender), beside=T, legend=c("low", "medium",
   +   "high"), ylim=c(0, 50))




                                Data Objects: Factors                    31
Data Frames: Frequency & Correlation
• Calculating Frequency Tables
   >   table(ses) # One way
   >   tab1=table(gender, ses) # Two way crosstab
   >   prop.table(tab1,1) # row proportions
   >   prop.table(tab1,2) # column proportions
   >   rowSums(tab1) # row frequencies
   >   colSums(tab1) # column frequencies
• Correlations & scatter plot
   >   # correlation of a pair of variables
   >   cor(write, math)
   >   cor(write, science)
   >   cor(write, science, use="complete.obs")
   >   # correlation matrix
   >   cor(read.sci, use="complete.obs")
   >   plot(math, write) # scatter plot matrix
   >   plot(read.sci)



                                 Data Objects: Factors   32
Data Objects: Factors
• Factors are numeric or character vectors that have an associated set of
  levels—a finite set of values the categorical variable can have.
• Let’s create a numeric vector of school types. 0 = private, 1 = public.
   > sch.type = sample(0:1, 6, replace=T)
   > sch.type
   [1] 0 1 0 1 0 0
   > is.factor(sch.type)
   [1] FALSE
   > factor(sch.type) # return factor of sch.type
   [1] 0 1 0 1 0 0
   Levels: 0 1
   > sch.type.f = factor(sch.type,labels=c("private","public"))
   > sch.type.f
   [1] private public private public private private
   Levels: private public




                                Data Objects: Factors                       33
Data Objects: Factors
• Let’s create a character vector of socioeconomic status.
   > ses = c("low","high","high","middle","low","low")
   > ses
   [1] "low"    "high"   "high"   "middle" "low"    "low"
   > is.factor(ses)
   [1] FALSE
   > factor(ses)
   [1] low    high   high   middle low    low
   Levels: high low middle
   > ses.f = factor(ses, levels=c("low","middle","high"))
   > ses.f
   [1] low    high   high   middle low    low
   Levels: low middle high
• The levels()will also tell us the levels of a factor.
   > levels(ses.f)
   [1] "low"    "middle" "high"



                                Data Objects: Factors        34
Data Objects: Factors
• Comparing a table without factors to a table with factors.
   > table(ses,sch.type) # without factors
           sch.type
   ses      0 1
     high   1 1
     low    3 0
     middle 0 1

   > table(ses.f,sch.type.f) # with factors
           sch.type.f
   ses.f    private public
     low          3      0
     middle       0      1
     high         1      1




                                Data Objects: Factors          35
Exercises
1. Use the data set mtcars included in R.
      Sort the data set by weight, heaviest first.
      Which car gets the best mileage (largest mpg)?
      Which gets the worst?
      The cars in rows c(1:3, 8:14, 18:21, 26:28, 30:32) were imported into the United States. Compare the
      variable mpg for imported and domestic cars using a boxplot. Is there a difference?
      Make a scatterplot of weight, wt, versus miles per gallon, mpg. Label the points according to the number
      of cylinders, cyl. Describe any trends.
2. The data set cfb (http://wiener.math.csi.cuny.edu/UsingR/Data/cfb.R) contains
   consumer finance data for 1,000 consumers.
      Create a data frame consisting of just those consumers with positive INCOME and negative NETWORTH.
      What is its size?
3. Use the data set ewr (http://wiener.math.csi.cuny.edu/UsingR/Data/ewr.R). We
   extract just the values for the times with df=ewr [,3:10]. The mean of each column
   is found by using mean (df).
      How would you find the mean of each row? Why might this be interesting?
Exercises from Using R for Introductory Statistics by John Verzani.


                                                        Data Objects: Factors                                36
Resources
The following resources were used for the workshop materials.
They are also excellent R references for your continued learning.
UCLA’s Resources for R
R-Bloggers
R-Tutor
Quick R
Using R for Introductory Statistics by John Verzani.




                           Data Objects: Factors                37

More Related Content

What's hot

R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examplesDennis
 
Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?Lucas Witold Adamus
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Jesper Kamstrup Linnet
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheetGil Cohen
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy DyagilevMonads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy DyagilevJavaDayUA
 
18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and SetIntro C# Book
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Python dictionary : past, present, future
Python dictionary: past, present, futurePython dictionary: past, present, future
Python dictionary : past, present, futuredelimitry
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Dr. Volkan OBAN
 

What's hot (20)

Spsl v unit - final
Spsl v unit - finalSpsl v unit - final
Spsl v unit - final
 
R lecture oga
R lecture ogaR lecture oga
R lecture oga
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
R programming language
R programming languageR programming language
R programming language
 
Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?
 
Dictionary in python
Dictionary in pythonDictionary in python
Dictionary in python
 
Python Puzzlers
Python PuzzlersPython Puzzlers
Python Puzzlers
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheet
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy DyagilevMonads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
 
R programming
R programmingR programming
R programming
 
18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set18. Dictionaries, Hash-Tables and Set
18. Dictionaries, Hash-Tables and Set
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Programming in R
Programming in RProgramming in R
Programming in R
 
Python dictionary : past, present, future
Python dictionary: past, present, futurePython dictionary: past, present, future
Python dictionary : past, present, future
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
 

Similar to Ggplot2 v3

Similar to Ggplot2 v3 (20)

Data Types of R.pptx
Data Types of R.pptxData Types of R.pptx
Data Types of R.pptx
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
Introduction2R
Introduction2RIntroduction2R
Introduction2R
 
Python Day1
Python Day1Python Day1
Python Day1
 
R language introduction
R language introductionR language introduction
R language introduction
 
R training2
R training2R training2
R training2
 
R basics
R basicsR basics
R basics
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
 
BA lab1.pptx
BA lab1.pptxBA lab1.pptx
BA lab1.pptx
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 

Ggplot2 v3

  • 1. Graphing with R & ggplot2 Week 1: Introduction to R Chicago Data Visualization Group Workshop October 8, 2012
  • 2. Agenda • Introductions • Survey: http://j.mp/ggplot2-2012. • Overview • Workshop Data Objects: Factors 2
  • 3. Workshop Overview Introduction to R (Week 1) We will familiarized ourselves with the R environment with a gentle introduction to the basic functions. After installing R, we will import and inspect data sets while becoming familiar with R terminology. By the end of the class, we will conduct basic descriptions and plots of the data. Introduction to ggplot2 (Week 2) We will begin to use the ggplot2 package to create basic, but handsome, univariate, bivariate, and time- series graphs. We will introduce the functions and terminology used in ggplot2. We will also explain the fundamentals of proper data visualization techniques and how it relates to the ggplot2 defaults. Grammar of Graphics (Week 3) We will continue to show more advanced features of ggplot2, including how it relates to Leland Wilkinson's Grammar of Graphics. We will show how to plot more than 2 variables in a single graph using colors, shapes, and sizes. We will also discuss how human ability to perceive different shapes and colors should drive the choices we make in data visualization. Plots for Publications (Week 4) After learning how to make plots, we will learn how to customize graphs with custom colors, labels, and themes. We will emphasize how to create a customized look to be included in publications, including addings labels in diagrams to help readers. Data Objects: Factors 3
  • 4. Installing R 1. Go to http://cran.r-project.org/ 2. Download the installation for your OS. 3. Follow instructions for installation. 4. Start R from menu. Data Objects: Factors 4
  • 5. The R Console • R can execute scripts or respond interactively at the console. > 5+4 # addition [1] 9 > 2^10 # exponentiation [1] 1024 > a=2; b=10 # two commands on same line > a^b [1] 1024 >ls() # list of existing objects > x = rnorm(100); y = rnorm(100) # generate random distributions > length(x) # get the length of object [1] 100 > mean(x) # calculate the mean [1] 0.07163738 > sd(x) # calculate the std.dev [1] 1.086229 > plot(x, y) # create a simple plot Data Objects: Factors 5
  • 6. The R Working Directory • R works in the context of a directory. This is usually the directory from where R was started. • Get and Set your working directory. > getwd() # Determine your working directory > setwd(“your directory") # set to your data directory location > getwd() # check that you are in the correct directory > dir() # list file names in the current directory Data Objects: Factors 6
  • 7. Getting Help • At the console > help.start() # general help page > Help(functionname) # help on function > ?(functionname) > ??(search string) # find all references to search string > Example(topic) # see an example of topic > Demo() # see demos Data Objects: Factors 7
  • 8. Data Types & Data Objects • Data objects – Vector: a set of elements of the same type. – Matrix: a set of elements in rows & columns of the same type. – data frame: rows & columns of elements of different types. – Lists & Arrays* • Data types (aka mode) of data objects – Numeric: 3.14 and 3,4,5,…. – Character: “abc”, “cat”, “dog”,… – Logical: TRUE, FALSE, NA – Complex and Raw* * Out of scope for this presentation Data Objects: Factors 8
  • 9. Data Types: Numeric • Decimal values are numeric in R. > x = 10.5 # assign a decimal value > x # print the value of x [1] 10.5 > class(x) # print the class name of x [1] "numeric“ • Integer values are saved as numeric. > k = 1 > k # print the value of k [1] 1 > class(k) # print the class name of k [1] "numeric“ • The fact that k is not an integer can be confirmed with the is.integer(). > is.integer(k) # is k an integer? [1] FALSE Data Objects: Factors 9
  • 10. Data Types: Integer • Create an integer with the as.integer(). > y = as.integer(3) > y # print the value of y [1] 3 > class(y) # print the class name of y [1] "integer" > is.integer(y) # is y an integer? [1] TRUE • Coerce a numeric value into an integer with the same as.integer(). > as.integer(3.14) # coerce a numeric value [1] 3 • Parse a string for decimal values in much the same way. > as.integer("5.27") # coerce a decimal string [1] 5 Data Objects: Factors 10
  • 11. Data Type: Logical • A logical value is often created via comparison between variables. > x = 1; y = 2 # sample values > z = x > y # is x larger than y? > z # print the logical value [1] FALSE > class(z) # print the class name of z [1] "logical" • Standard logical operations: "&", "|", and "!". > u = TRUE; v = FALSE > u & v # u AND v [1] FALSE > u | v # u OR v [1] TRUE > !u # negation of u [1] FALSE Data Objects: Factors 11
  • 12. Data Type: Character • A character data type is used to represent string values in R. > x = as.character(3.14) > x # print the character string [1] "3.14" > class(x) # print the class name of x [1] "character" • Multiple character values can be concatenated with the paste()or sprintf(). > name = "Joe"; amount = 100 > paste(name, "has", amount, "dollars") Both expressions produce > sprintf("%s has %d dollars", name, amount) the same result [1] "Joe has 100 dollars“ • To extract a substring, we apply the substr(). > substr("Mary has a little lamb.", start=3, stop=12) [1] "ry has a l" • Replace strings with the sub(). > sub("little", "big", "Mary has a little lamb.") [1] "Mary has a big lamb." Data Objects: Factors 12
  • 13. Data Objects: Vectors • A vector is a sequence of data elements of the same basic type. • Here is a numeric vector created by the concatenation function c(). > c(2, 3, 5) [1] 2 3 5 • Vectors can be combined via the function c. > n = c(2, 3, 5) > s = c("aa", "bb", "cc", "dd", "ee") > c(n, s) [1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee“ – Notice the numeric values are coerced into character strings. Data Objects: Factors 13
  • 14. Vectors: Sequences & Patterns • Sequences & patterns in vectors can be created with seq() & rep() > seq(1,9,by=2) [1] 1 3 5 7 9 > seq(8,20,length=6) [1] 8.0 10.4 12.8 15.2 17.6 20.0 > rep(1:3,6) [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 > rep(1:3,rep(6,3)) [1] 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 • Ranges can be created with the : operator > 1:5 [1] 1 2 3 4 5 Data Objects: Factors 14
  • 15. Vectors: Accessing elements • Access vectors values with [i] operator. > s = c("aa", "bb", "cc", "dd", "ee") > s[3] [1] "cc" • Other methods of accessing vector values by index > s[-3] # Negative index removes value. [1] "aa" "bb" "dd" "ee" > s[c(2, 3)] # Numeric index [1] "bb" "cc" > s[c(2, 3, 3)] # Duplicate indexes [1] "bb" "cc" "cc" > s[c(2, 1, 3)] # Out of order indexes [1] "bb" "aa" "cc" > s[2:4] # Range index [1] "bb" "cc" "dd“ > s[c(FALSE, TRUE, FALSE, TRUE, FALSE)] # Logical index [1] "bb" "dd" Data Objects: Factors 15
  • 16. Vectors: Arithmetic • Arithmetic operations of vectors are performed member-by-member. • For example, suppose we have two vectors a and b. > a = c(1, 3, 5, 7); b = c(1, 2, 4, 8) • Examples of arithmetic operations. > 5 * a # Multiplication of a by 5. [1] 5 15 25 35 > a + b # Addition of a & b vectors [1] 2 5 9 15 > a * b # Multiplication of a & b [1] 1 6 20 56 # Division and subtraction are also possible. • If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector. > u = c(10, 20, 30); v = c(1, 2, 3, 4, 5, 6, 7, 8, 9) > u + v [1] 11 22 33 14 25 36 17 28 39 Data Objects: Factors 16
  • 17. Vectors: Subsetting and Summary • Let's suppose we've collected some data from an experiment > x=c(7.5,8.2,3.1,5.6,8.2,9.3,6.5,7.0,9.3,1.2,14.5,6.2) • Some simple summary statistics of these data can be produced: > mean(x) [1] 7.216667 > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.200 6.050 7.250 7.217 8.475 14.500 • Suppose we want summaries of two extracts of this data. > summary(x[1:6]) Min. 1st Qu. Median Mean 3rd Qu. Max. 3.100 6.075 7.850 6.983 8.200 9.300 > summary(x[7:12]) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.200 6.275 6.750 7.450 8.725 14.500 Data Objects: Factors 17
  • 18. Exercises 1. Define (c) seq(4,10,by = 2) (d) seq(3,30,length = 10) > x = c(4,2,6) (e) seq(6,-4,by = -2) > y = c(1,0,-1) Decide what the result will be of 3. Determine these patterns. (a) rep(2,4) the following: (b) rep(c(1,2),4) (a) length(x) (c) rep(c(1,2),c(4,4)) (b) sum(x) (d) rep(1:4,4) (c) sum(x^2) (e) rep(1:4,rep(3,4)) (d) x+y (e) x*y 4. Use the rep()to define the (f) x-2 following vectors. (g) x^2 (a) 6,6,6,6,6,6 (b) 5,8,5,8,5,8,5,8 2. Determine these sequences. (c) 5,5,5,5,8,8,8,8 (a) 7:11 (b) seq(2,9) Data Objects: Factors 18
  • 19. Exercises 5. If x =c(5,9,2,3,4,6,7,0,8,12,2,9) determine the following. (a) x[2] (b) x[2:4] (c) x[c(2,3,6)] (d) x[c(1:5,10:12)] (e) x[-(10:12)] Exercises taken from R – A Self Learn Tutorial from the National Center for Ecological Analysis and Synthesis Data Objects: Factors 19
  • 20. Data Objects: Matrices • A matrix is a collection of data elements of the same type arranged in a two-dimensional rectangular layout. • Matrices can be created in R in a variety of ways. Perhaps the simplest is to create the columns and then glue them together with the command cbind. > x=c(5,7,9) > y=c(6,3,4) > z=cbind(x,y) > z x y [1,] 5 6 [2,] 7 3 [3,] 9 4 > dim(z) # Get the dimensions of the matrix [1] 3 2 • Execute the expression rbind(x,y)and observe the result. Data Objects: Factors 20
  • 21. Matrices: Creating • Using the matrix() to create matrices. > matrix(c(2, 4, 3, 1, 5, 7),nrow=2,ncol=3,byrow = TRUE) [,1] [,2] [,3] [1,] 2 4 3 [2,] 1 5 7 • Removing, or changing, the values of nrow, ncol, & byrow will affect the shape of the matrix. Experiment with this to see how the shape changes. • The dim() can also be used to create a matrix from a vector. > a = c(5,10,15,20,25,30) > dim(a) = c(2,3) # Reshape “a” by assigning it dimensions > a [,1] [,2] [,3] [1,] 5 15 25 [2,] 10 20 30 • Transpose matrices with the t(). Data Objects: Factors 21
  • 22. Matrices: Accessing Elements • An element from our matrix “a” can be accessed by with []. > a[2,3] # access the element in the second row, third column [1] 30 > a[2,] # access the entire second row [1] 10 20 30 > a[,3] # access the entire third column [1] 25 30 > a[ ,c(1,3)] # access the first and third column [,1] [,2] [1,] 5 25 [2,] 10 30 • Rows and columns can be named. > dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3")) > a[ ,c("col 1","col 3")] col 1 col 3 row 1 5 25 row 2 10 30 Data Objects: Factors 22
  • 23. Matrices: Arithmetic > ## The Matrices > ## Matrix Math > z =matrix(c(5,7,9,6,3,4),nrow=3,byrow=T) > y + z > z # a 3 x 2 matrix [,1] [,2] [,1] [,2] [1,] 6 10 [1,] 5 7 [2,] 9 15 [2,] 9 6 [3,] 8 3 [3,] 3 4 > y * z > y=matrix(c(1,3,0,9,5,-1),nrow=3,byrow=T) [,1] [,2] > y # a 3 x 2 matrix [1,] 5 21 [,1] [,2] [2,] 0 54 [1,] 1 3 [3,] 15 -4 [2,] 0 9 > y%*%x [3,] 5 -1 [,1] [,2] > x=matrix(c(3,4,-2,6),nrow=2,byrow=T) [1,] -3 22 > x # a 2 x 2 [2,] -18 54 [,1] [,2] [3,] 17 14 [1,] 3 4 • Link to matrix [2,] -2 6 multiplication tutorial. Data Objects: Factors 23
  • 25. Data Objects: Data Frames • A data frame is the most common data object in R and is used for storing data tables. It is a list of vectors of equal length. > n = c(2,3,5); s = c("aa","bb","cc"); b = c(TRUE,FALSE,TRUE) > data.frame(n,s,b) # output the data frame n s b 1 2 aa TRUE 2 3 bb FALSE 3 5 cc TRUE • Data frames can contain columns of data from different types. • Import data using read.table() and read.csv(). Both return data frames. Data Objects: Factors 25
  • 26. Data Frames: Importing data • Get the following data file and save to working directory. – http://www.ats.ucla.edu/stat/R/notes/hs0.csv • Import with read.table() > schdat = read.table("hs0.csv", header=T, sep=",") > names(schdat) [1] "gender" "id" "race" "ses" "schtyp" "prgtype" "read" "write" "math" "science" "socst" > head(schdat) gender id race ses schtyp prgtype read write math science socst 1 0 70 4 1 1 general 57 52 41 47 57 2 1 121 4 2 1 vocati 68 59 53 63 61 3 0 86 4 3 1 general 44 33 54 58 31 4 0 141 4 3 1 vocati 63 44 47 53 56 5 0 172 4 2 1 academic 47 52 57 53 61 6 0 113 4 2 1 academic 44 52 51 63 61 Data Objects: Factors 26
  • 27. Data Frames: Subsetting • Retrieving a column vector using [[]] or $ or [,] – These return the same vector: schdat[[3]];schdat$race;schdat[,”race”] > str(schdat$race) #get the structure of schdat$race int [1:200] 4 4 4 4 4 4 3 1 4 3 ... • Retrieving a data frame column slice using [] – The return the same data frame: schdat[3];schdat[“Race”] > str(schdat["race"]) #get the structure of schdat[“race”] 'data.frame': 200 obs. of 1 variable: $ race: int 4 4 4 4 4 4 3 1 4 3 ... • Retrieving a data frame row slice using [] – The return the same data frame: schdat[3];schdat[“Race”] > schdat[5,] gender id race ses schtyp prgtype read write math science socst 5 0 172 4 2 1 academic 47 52 57 53 61 – Run str(schdat[5,]) to see the structure of this data frame. Data Objects: Factors 27
  • 28. Data Frames: Subsetting • We can use the subset()to slice both columns and rows. Let’s extract only the read, write, math, science scores for the “academic” schools. > schdat.academic = subset(schdat, prgtype=="academic", + select=c("read","write","math","science")) > head(schdat.academic) read write math science 5 47 52 57 53 6 44 52 51 63 8 34 46 45 39 10 57 55 52 50 12 57 65 51 63 13 73 60 71 61 Data Objects: Factors 28
  • 29. Data Frames: Exploring data • Let’s subset the read, write, math, and science scores for analysis > read.sci = schdat[ , c("read","write","math","science")] > summary(read.sci) # get a 5 number summary read write math science Min. :28.00 Min. :31.00 Min. :33.00 Min. :26.00 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00 1st Qu.:44.00 Median :50.00 Median :54.00 Median :52.00 Median :53.00 Mean :52.23 Mean :52.77 Mean :52.65 Mean :51.66 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00 3rd Qu.:58.00 Max. :76.00 Max. :67.00 Max. :75.00 Max. :74.00 NA's :5 Data Objects: Factors 29
  • 30. Data Frames: Further Analysis • Let’s look at additional statistics in the school data set > attach(schdat) # allow access to data by variable name only. > options(digits=2) # set significant digits > m = tapply(write,prgtype,mean) # tapply() calculates for every row > v = tapply(write,prgtype,var) > med = tapply(write,prgtype,median) > n = tapply(write,prgtype,length) > sd = tapply(write,prgtype,sd) > cbind(mean=m,var=v,std.dev=sd,median=med,n=n) mean var std.dev median n academic 56 63 7.9 59 105 general 51 88 9.4 54 45 vocati 47 87 9.3 46 50 > options(digits=7) Data Objects: Factors 30
  • 31. Data Frames: Graphing • Exploring data through graphs. Execute the following and examine the output. > hist(write) > library(lattice) # load trellis graphics > histogram(~write, schdat, type="count") # trellis graphs > histogram(~write | gender, schdat, type="count") # histogram > hist(write, breaks=15) # change the number of bins to 15 > boxplot(write) # boxplot function in the graphics package > bwplot(ses~ write| gender, schdat) # boxplot by gender > barplot(table(ses, gender), legend=c("low", "medium", "high")) > barplot(table(ses, gender), beside=T, legend=c("low", "medium", + "high"), ylim=c(0, 50)) Data Objects: Factors 31
  • 32. Data Frames: Frequency & Correlation • Calculating Frequency Tables > table(ses) # One way > tab1=table(gender, ses) # Two way crosstab > prop.table(tab1,1) # row proportions > prop.table(tab1,2) # column proportions > rowSums(tab1) # row frequencies > colSums(tab1) # column frequencies • Correlations & scatter plot > # correlation of a pair of variables > cor(write, math) > cor(write, science) > cor(write, science, use="complete.obs") > # correlation matrix > cor(read.sci, use="complete.obs") > plot(math, write) # scatter plot matrix > plot(read.sci) Data Objects: Factors 32
  • 33. Data Objects: Factors • Factors are numeric or character vectors that have an associated set of levels—a finite set of values the categorical variable can have. • Let’s create a numeric vector of school types. 0 = private, 1 = public. > sch.type = sample(0:1, 6, replace=T) > sch.type [1] 0 1 0 1 0 0 > is.factor(sch.type) [1] FALSE > factor(sch.type) # return factor of sch.type [1] 0 1 0 1 0 0 Levels: 0 1 > sch.type.f = factor(sch.type,labels=c("private","public")) > sch.type.f [1] private public private public private private Levels: private public Data Objects: Factors 33
  • 34. Data Objects: Factors • Let’s create a character vector of socioeconomic status. > ses = c("low","high","high","middle","low","low") > ses [1] "low" "high" "high" "middle" "low" "low" > is.factor(ses) [1] FALSE > factor(ses) [1] low high high middle low low Levels: high low middle > ses.f = factor(ses, levels=c("low","middle","high")) > ses.f [1] low high high middle low low Levels: low middle high • The levels()will also tell us the levels of a factor. > levels(ses.f) [1] "low" "middle" "high" Data Objects: Factors 34
  • 35. Data Objects: Factors • Comparing a table without factors to a table with factors. > table(ses,sch.type) # without factors sch.type ses 0 1 high 1 1 low 3 0 middle 0 1 > table(ses.f,sch.type.f) # with factors sch.type.f ses.f private public low 3 0 middle 0 1 high 1 1 Data Objects: Factors 35
  • 36. Exercises 1. Use the data set mtcars included in R. Sort the data set by weight, heaviest first. Which car gets the best mileage (largest mpg)? Which gets the worst? The cars in rows c(1:3, 8:14, 18:21, 26:28, 30:32) were imported into the United States. Compare the variable mpg for imported and domestic cars using a boxplot. Is there a difference? Make a scatterplot of weight, wt, versus miles per gallon, mpg. Label the points according to the number of cylinders, cyl. Describe any trends. 2. The data set cfb (http://wiener.math.csi.cuny.edu/UsingR/Data/cfb.R) contains consumer finance data for 1,000 consumers. Create a data frame consisting of just those consumers with positive INCOME and negative NETWORTH. What is its size? 3. Use the data set ewr (http://wiener.math.csi.cuny.edu/UsingR/Data/ewr.R). We extract just the values for the times with df=ewr [,3:10]. The mean of each column is found by using mean (df). How would you find the mean of each row? Why might this be interesting? Exercises from Using R for Introductory Statistics by John Verzani. Data Objects: Factors 36
  • 37. Resources The following resources were used for the workshop materials. They are also excellent R references for your continued learning. UCLA’s Resources for R R-Bloggers R-Tutor Quick R Using R for Introductory Statistics by John Verzani. Data Objects: Factors 37

Editor's Notes

  1. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  2. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  3. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  4. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  5. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  6. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  7. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  8. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  9. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  10. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  11. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  12. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  13. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30
  14. > a[2,3][1] 30> a[2,][1] 10 20 30> a[,3][1] 25 30> a[ ,c(1,3)] [,1] [,2][1,] 5 25[2,] 10 30> dimnames(a) = list(c("row 1","row 2"),c("col 1","col 2","col 3"))> a col 1 col 2 col 3row 1 5 15 25row 2 10 20 30> a[ ,c("col 1","col 3")] col 1 col 3row 1 5 25row 2 10 30