Contenu connexe Similaire à R Programming: Transform/Reshape Data In R (20) Plus de Rsquared Academy (20) R Programming: Transform/Reshape Data In R3. r-squared
Slide 3
Working With Data
www.r-squared.in/rprogramming
✓ Data Types
✓ Data Structures
✓ Data Creation
✓ Data Info
✓ Data Subsetting
✓ Comparing R Objects
✓ Importing Data
✓ Exporting Data
✓ Data Transformation
✓ Numeric Functions
✓ String Functions
✓ Mathematical Functions
5. r-squared
Slide 5
Reorder Data
www.r-squared.in/rprogramming
In the course of analyzing data, sometimes it is necessary to reorder the data as we
cannot use the data in its original format. Sorting the data is the best example of such
reordering. In this section, we will learn the following functions:
✓ t (transpose)
✓ order
✓ sort
✓ rank
7. r-squared
Slide 7
t()
www.r-squared.in/rprogramming
Examples
> # example 1
> m <- matrix(1:6, nrow = 2)
> dim(m)
[1] 2 3
> dim(t(m))
[1] 3 2
> m # 2 x 3 matrix
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> t(m) # t() returns a 3 x 2 matrix
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
8. r-squared
Slide 8
t()
www.r-squared.in/rprogramming
Examples
> # example 2
> data <- mtcars
> head(data)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> data_transpose <- t(data)
> head(data_transpose)
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
mpg 21.00 21.000 22.80 21.400 18.70 18.10
cyl 6.00 6.000 4.00 6.000 8.00 6.00
disp 160.00 160.000 108.00 258.000 360.00 225.00
hp 110.00 110.000 93.00 110.000 175.00 105.00
drat 3.90 3.900 3.85 3.080 3.15 2.76
wt 2.62 2.875 2.32 3.215 3.44 3.46
11. r-squared
Slide 11
order()
www.r-squared.in/rprogramming
Examples
> # example 2
> data_ascending <- data[order(data$mpg),]
> data_descending <- data[order(-data$mpg),]
> head(data_ascending)
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
> head(data_descending)
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
15. r-squared
Slide 15
rank()
www.r-squared.in/rprogramming
Examples
> # example 1
> x <- sample(1:10)
> x
[1] 7 9 1 8 6 5 3 2 10 4
> rank(x)
[1] 7 9 1 8 6 5 3 2 10 4
> # example 2
> x2 <- c(3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5)
> order(x2)
[1] 2 4 7 1 10 3 5 9 11 8 6
> sort(x2)
[1] 1 1 2 3 3 4 5 5 5 6 9
> (r2 <- rank(x2)) # ties are averaged
[1] 4.5 1.5 6.0 1.5 8.0 11.0 3.0 10.0 8.0 4.5 8.0
19. r-squared
Slide 19
subset()
www.r-squared.in/rprogramming
Examples
> # example 2
> # subsetting data frames
> subset(mtcars, mpg >= 23 & mpg <= 27)
mpg cyl disp hp drat wt qsec vs am gear carb
Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
Porsche 914-2 26.0 4 120.3 91 4.43 2.14 16.7 0 1 5 2
> subset(mtcars, mpg >= 23 & mpg <= 27, select = c(cyl, hp))
cyl hp
Merc 240D 4 62
Porsche 914-2 4 91
> subset(mtcars, cyl == 4 & hp > 100, select = mpg:wt)
mpg cyl disp hp drat wt
Lotus Europa 30.4 4 95.1 113 3.77 1.513
Volvo 142E 21.4 4 121.0 109 4.11 2.780
21. r-squared
Slide 21
which()
www.r-squared.in/rprogramming
Examples
> # example 1
> x
[1] 7 9 1 8 6 5 3 2 10 4
> which(x == 5) # returns index of value 5.
[1] 6
> which(x > 4) # returns indices of all values greater than 4.
[1] 1 2 4 5 6 9
> # example 2
> # using data frame
> which(data$mpg > 20) # returns indices of values greater than 20.
[1] 1 2 3 4 8 9 18 19 20 21 26 27 28 32
> data$mpg[which(data$mpg > 20)] # returns values greater than 20.
[1] 21.0 21.0 22.8 21.4 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4
22. r-squared
Slide 22
which()
www.r-squared.in/rprogramming
Examples
> # example 3
> m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> which(m > 5)
[1] 6
> which(m == 5)
[1] 5
> which(letters == "r") # r is the 18th alphabet
[1] 18
> div_by_3 <- m %% 3 == 0
> div_by_3
[,1] [,2] [,3]
[1,] FALSE TRUE FALSE
[2,] FALSE FALSE TRUE
> which(div_by_3) # which values in m are divisible by 3.
[1] 3 6
25. r-squared
Slide 25
droplevels()
www.r-squared.in/rprogramming
Examples
> # example 2
> aq <- transform(airquality, Month = factor(Month, labels = month.abb[5:9]))
> aq <- subset(aq, Month != "Jul")
> table(aq$Month)
May Jun Jul Aug Sep
31 30 0 31 30
> table(droplevels(aq)$Month)
May Jun Aug Sep
31 30 31 30
> droplevels(data_cyl)$cyl
[1] 6 6 4 6 6 4 4 6 6 4 4 4 4 4 4 4 6 4
Levels: 4 6
> table(droplevels(data_cyl)$cyl)
4 6
11 7
> table(data_cyl$cyl)
4 6 8
11 7 0
30. r-squared
Slide 30
merge()
www.r-squared.in/rprogramming
Examples
> # example 1
> name <- c("John", "Jane", "Tom", "Jennifer")
> age <- c(20, 25, 30, 28)
> gender <- factor(c("male", "female", "male", "female"))
> data_1 <- data.frame(name, age)
> data_2 <- data.frame(name, gender)
> data_3 <- merge(data_1, data_2, by = "name")
> head(data_3)
name age gender
1 Jane 25 female
2 Jennifer 28 female
3 John 20 male
4 Tom 30 male
32. r-squared
Slide 32
cbind()
www.r-squared.in/rprogramming
Examples
> # example 1
> cbind(1, 1:4)
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 1 4
> # example 2
> m1 <- matrix(1:4, nrow = 2)
> m2 <- matrix(5:8, nrow = 2)
> cbind(m1, m2)
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
33. r-squared
Slide 33
cbind()
www.r-squared.in/rprogramming
Examples
> # example 3
> name <- c("John", "Jane", "Tom", "Jennifer")
> age <- c(20, 25, 30, 28)
> gender <- factor(c("male", "female", "male", "female"))
> data_1 <- data.frame(name, age)
> data_2 <- data.frame(name, gender)
> data_3 <- merge(data_1, data_2, by = "name")
> data_4 <- cbind(data_3, income)
> head(data_4)
name age gender income
1 Jane 25 female 25000
2 Jennifer 28 female 30000
3 John 20 male 35000
4 Tom 30 male 40000
35. r-squared
Slide 35
rbind()
www.r-squared.in/rprogramming
Examples
> # example 1
> rbind(1, 1:4)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 1
[2,] 1 2 3 4 5 6 7 8
> # example 2
> m1 <- matrix(1:4, nrow = 2)
> m2 <- matrix(5:8, nrow = 2)
> rbind(m1, m2)
[,1] [,2]
[1,] 1 3
[2,] 2 4
[3,] 5 7
[4,] 6 8
36. r-squared
Slide 36
rbind()
www.r-squared.in/rprogramming
Examples
> # example 3
> name <- c("John", "Jane", "Tom", "Jennifer")
> age <- c(20, 25, 30, 28)
> gender <- factor(c("male", "female", "male", "female"))
> data_1 <- data.frame(name, age)
> data_2 <- data.frame(name, gender)
> data_3 <- merge(data_1, data_2, by = "name")
> data_4 <- data_3
> data_rbind <- rbind(data_3, data_4)
> head(data_rbind)
name age gender
1 Jane 25 female
2 Jennifer 28 female
3 John 20 male
4 Tom 30 male
5 Jane 25 female
6 Jennifer 28 female
38. r-squared
Slide 38
interaction()
www.r-squared.in/rprogramming
Examples
> # example 1
> mtcars$gear <- as.factor(mtcars$gear)
> mtcars$cyl <- as.factor(mtcars$cyl)
> interaction(mtcars$cyl, mtcars$gear)
[1] 6.4 6.4 4.4 6.3 8.3 6.3 8.3 4.4 4.4 6.4 6.4 8.3 8.3 8.3 8.3 8.3 8.3 4.4 4.4 4.4
[21] 4.3 8.3 8.3 8.3 8.3 4.4 4.5 4.5 8.5 6.5 8.5 4.4
Levels: 4.3 6.3 8.3 4.4 6.4 8.4 4.5 6.5 8.5
> # example 2
> mtcars$am <- as.factor(mtcars$am)
> mtcars$cyl <- as.factor(mtcars$cyl)
> interaction(mtcars$cyl, mtcars$am)
[1] 6.1 6.1 4.1 6.0 8.0 6.0 8.0 4.0 4.0 6.0 6.0 8.0 8.0 8.0 8.0 8.0 8.0 4.1 4.1 4.1
[21] 4.0 8.0 8.0 8.0 8.0 4.1 4.1 4.1 8.1 6.1 8.1 4.1
Levels: 4.0 6.0 8.0 4.1 6.1 8.1
41. r-squared
Slide 41
transform()
www.r-squared.in/rprogramming
Examples
> # example 1
> data <- mtcars
> head(data)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> head(transform(data, mpg = -mpg, disp = disp / wt))
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 -21.0 6 61.06870 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag -21.0 6 55.65217 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 -22.8 4 46.55172 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive -21.4 6 80.24883 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout -18.7 8 104.65116 175 3.15 3.440 17.02 0 0 3 2
Valiant -18.1 6 65.02890 105 2.76 3.460 20.22 1 0 3 1
42. r-squared
Slide 42
transform()
www.r-squared.in/rprogramming
Examples
> # example 2
> data <- mtcars
> head(transform(data, wtdrat = wt * drat))
mpg cyl disp hp drat wt qsec vs am gear carb wtdrat
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 10.2180
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 11.2125
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8.9320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 9.9022
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 10.8360
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 9.5496
44. r-squared
Slide 44
transform()
www.r-squared.in/rprogramming
Examples
> # example 1
> x <- jitter(sample(1:100))
> x
[1] 69.8630359 44.8489212 61.1385505 50.8402454 26.0350033 97.0510463 42.8749472
[8] 2.0313452 69.0713593 9.1714536 64.8470522 62.8787114 98.9115336 58.8020429
[15] 81.9908416 87.1495953 16.9303676 11.9593307 38.0015233 20.1833953 14.0838761
………………………………………………………………………………………………………………………………………
……………………………………………………………………………………………
[85] 3.0042776 33.9052141 97.8309652 47.1207229 77.1890815 41.8063134 39.9223398
[92] 27.8306122 80.0271128 18.1951342 85.1410689 23.1750646 6.1861739 27.0493739
[99] 36.9679664 18.9148518
> c <- cut(x, breaks = 10)
> table(c)
c
(0.827,10.8] (10.8,20.8] (20.8,30.7] (30.7,40.6] (40.6,50.5] (50.5,60.4] (60.4,70.3]
10 10 10 10 10 10 10
(70.3,80.3] (80.3,90.2] (90.2,100]
10 10 10
48. r-squared
Slide 48
replace()
www.r-squared.in/rprogramming
Examples
> # example 1
> x <- sample(1:10)
> x
[1] 6 2 7 9 1 5 4 8 10 3
> replace(x, 5, 10)
[1] 6 2 7 9 10 5 4 8 10 3
# replace the value in the index position 5 in the vector x with the value 10.
> # example 1
> x <- sample(1:10)
> x
[1] 6 2 7 9 1 5 4 8 10 3
> replace(x, 3:5, c(2, 4, 6))
[1] 6 2 2 4 6 5 4 8 10 3
50. r-squared
Slide 50
scale()
www.r-squared.in/rprogramming
Examples
> # example 1
> m <- matrix(1:9, nrow = 3)
> m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> scale(m)
[,1] [,2] [,3]
[1,] -1 -1 -1
[2,] 0 0 0
[3,] 1 1 1
attr(,"scaled:center")
[1] 2 5 8
attr(,"scaled:scale")
[1] 1 1 1
52. r-squared
Slide 52
split()
www.r-squared.in/rprogramming
Examples
> # example 1
> x <- split(data$mpg, data$cyl)
> x
$`4`
[1] 22.8 24.4 22.8 32.4 30.4 33.9 21.5 27.3 26.0 30.4 21.4
$`6`
[1] 21.0 21.0 21.4 18.1 19.2 17.8 19.7
$`8`
[1] 18.7 14.3 16.4 17.3 15.2 10.4 10.4 14.7 15.5 15.2 13.3 19.2 15.8 15.0
> sapply(x, mean)
4 6 8
26.66364 19.74286 15.10000
> unsplit(x, data$cyl)
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 [19]
30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
56. r-squared
Slide 56
within()
www.r-squared.in/rprogramming
Examples
> # example 1
> data <- mtcars
> data <- within(data, mpg_cyl <- mpg * cyl)
> head(data)
mpg cyl disp hp drat wt qsec vs am gear carb mpg_cyl
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 126.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 126.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 91.2
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 128.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 149.6
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108.6
58. r-squared
Slide 58
by()
www.r-squared.in/rprogramming
Examples
> # example 1
> by(mtcars$mpg, mtcars$cyl, summary)
mtcars$cyl: 4
Min. 1st Qu. Median Mean 3rd Qu. Max.
21.40 22.80 26.00 26.66 30.40 33.90
----------------------------------------------------------------
mtcars$cyl: 6
Min. 1st Qu. Median Mean 3rd Qu. Max.
17.80 18.65 19.70 19.74 21.00 21.40
----------------------------------------------------------------
mtcars$cyl: 8
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 14.40 15.20 15.10 16.25 19.20
59. r-squared
In the next unit, we will explore the following numeric functions:
Slide 59
Next Steps...
www.r-squared.in/rprogramming
● signif()
● jitter()
● format()
● formatC()
● abs()
● round()
● ceiling()
● floor()