The document discusses several R packages for data wrangling (preprocessing) tasks. It provides a table with information on popular packages like plyr, reshape2, stringr, lubridate, sqldf, dplyr, data.table, and zoo. While dplyr is commonly used, the document focuses on introducing the plyr package, which can still be useful when working with list-type data. Examples show how to use plyr functions like llply and ddply to apply operations to multiple objects or subsets of data.
3. Data Wrangling用パッケージ群
パッケージ 用途 コメント 解説 作者
plyr data wrangling
While dplyr is my go-to package for wrangling data
frames, the older plyr package still comes in handy
when working with other types of R data such as
lists. CRAN.
llply(mylist, myfunction) Hadley Wickham
reshape2 data wrangling
Change data row and column formats from "wide"
to "long"; turn variables into column names or
column names into variables and more. The tidyr
package is a newer, more focused option, but I still
use reshape2. CRAN.
See my tutorial Hadley Wickham
stringr data wrangling
Numerous functions for text manipulation. Some
are similar to existing base R functions but in a more
standard format, including working with regular
expressions. Some of my favorites: str_pad and
str_trim. CRAN.
str_pad(myzipcodevector, 5, "left", "0") Hadley Wickham
lubridate data wrangling
Everything you ever wanted to do with date
arithmetic, although understanding & using
available functionality can be somewhat complex.
CRAN.
mdy("05/06/2015") + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham &
others
sqldf
data wrangling,
data analysis
Do you know a great SQL query you'd use if your R
data frame were in a SQL database? Run SQL
queries on your data frame with sqldf. CRAN.
sqldf("select * from mydf where mycol >
4")
G. Grothendieck
dplyr
data wrangling,
data analysis
The essential data-munging R package when
working with data frames. Especially useful for
operating on data by categories. CRAN.
See the intro vignette Hadley Wickham
data.table
data wrangling,
data analysis
Popular package for heavy-duty data wrangling.
While I typically prefer dplyr, data.table has many
fans for its speed with large data sets. CRAN.
Useful tutorial Matt Dowle & others
zoo
data wrangling,
data analysis
Robust package with a slew of functions for
dealing with time series data; I like the handy
rollmean function for calculating moving averages.
CRAN.
rollmean(mydf, 7) Achim Zeileis & others
http://www.computerworld.com/article/2921176/business-intelligence/great-r-packages-for-data-import-wrangling-visualization.html
4. Data Wrangling
Data munging or data wrangling is loosely the process of
manually converting or mapping data from one “raw” form into
another format that allows for more convenient consumption of
the data with the help of semi-automated tools. This may
include further munging, data visualization, data aggregation,
training a statistical model, as well as many other potential
uses. (Wikipedia)
データを分析可能な形に変換するプロセス
データクレンジング+変換…
≒ データ前処理
9. 今回取り上げるパッケージ
パッケージ 用途 コメント 解説 作者
plyr data wrangling
While dplyr is my go-to package for wrangling data
frames, the older plyr package still comes in handy
when working with other types of R data such as
lists. CRAN.
llply(mylist, myfunction) Hadley Wickham
reshape2 data wrangling
Change data row and column formats from "wide"
to "long"; turn variables into column names or
column names into variables and more. The tidyr
package is a newer, more focused option, but I still
use reshape2. CRAN.
See my tutorial Hadley Wickham
stringr data wrangling
Numerous functions for text manipulation. Some
are similar to existing base R functions but in a more
standard format, including working with regular
expressions. Some of my favorites: str_pad and
str_trim. CRAN.
str_pad(myzipcodevector, 5, "left", "0") Hadley Wickham
lubridate data wrangling
Everything you ever wanted to do with date
arithmetic, although understanding & using
available functionality can be somewhat complex.
CRAN.
mdy("05/06/2015") + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham &
others
sqldf
data wrangling,
data analysis
Do you know a great SQL query you'd use if your R
data frame were in a SQL database? Run SQL
queries on your data frame with sqldf. CRAN.
sqldf("select * from mydf where mycol >
4")
G. Grothendieck
dplyr
data wrangling,
data analysis
The essential data-munging R package when
working with data frames. Especially useful for
operating on data by categories. CRAN.
See the intro vignette Hadley Wickham
data.table
data wrangling,
data analysis
Popular package for heavy-duty data wrangling.
While I typically prefer dplyr, data.table has many
fans for its speed with large data sets. CRAN.
Useful tutorial Matt Dowle & others
zoo
data wrangling,
data analysis
Robust package with a slew of functions for
dealing with time series data; I like the handy
rollmean function for calculating moving averages.
CRAN.
rollmean(mydf, 7) Achim Zeileis & others
http://www.computerworld.com/article/2921176/business-intelligence/great-r-packages-for-data-import-wrangling-visualization.html
10. 今回取り上げるパッケージ
パッケージ 用途 コメント 解説 作者
plyr data wrangling
While dplyr is my go-to package for wrangling data
frames, the older plyr package still comes in handy
when working with other types of R data such as
lists. CRAN.
llply(mylist, myfunction) Hadley Wickham
reshape2 data wrangling
Change data row and column formats from "wide"
to "long"; turn variables into column names or
column names into variables and more. The tidyr
package is a newer, more focused option, but I still
use reshape2. CRAN.
See my tutorial Hadley Wickham
stringr data wrangling
Numerous functions for text manipulation. Some
are similar to existing base R functions but in a more
standard format, including working with regular
expressions. Some of my favorites: str_pad and
str_trim. CRAN.
str_pad(myzipcodevector, 5, "left", "0") Hadley Wickham
lubridate data wrangling
Everything you ever wanted to do with date
arithmetic, although understanding & using
available functionality can be somewhat complex.
CRAN.
mdy("05/06/2015") + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham &
others
sqldf
data wrangling,
data analysis
Do you know a great SQL query you'd use if your R
data frame were in a SQL database? Run SQL
queries on your data frame with sqldf. CRAN.
sqldf("select * from mydf where mycol >
4")
G. Grothendieck
dplyr
data wrangling,
data analysis
The essential data-munging R package when
working with data frames. Especially useful for
operating on data by categories. CRAN.
See the intro vignette Hadley Wickham
data.table
data wrangling,
data analysis
Popular package for heavy-duty data wrangling.
While I typically prefer dplyr, data.table has many
fans for its speed with large data sets. CRAN.
Useful tutorial Matt Dowle & others
zoo
data wrangling,
data analysis
Robust package with a slew of functions for
dealing with time series data; I like the handy
rollmean function for calculating moving averages.
CRAN.
rollmean(mydf, 7) Achim Zeileis & others
http://www.computerworld.com/article/2921176/business-intelligence/great-r-packages-for-data-import-wrangling-visualization.html
個人的に最も役立つのはdplyr、その補助
(出力形式変換)としてのtidyrだが、
以前に紹介したので今回は割愛
http://www.slideshare.net/kawaharahiroki/r-45226370
11. 今回取り上げるパッケージ
パッケージ 用途 コメント 解説 作者
plyr data wrangling
While dplyr is my go-to package for wrangling data
frames, the older plyr package still comes in handy
when working with other types of R data such as
lists. CRAN.
llply(mylist, myfunction) Hadley Wickham
reshape2 data wrangling
Change data row and column formats from "wide"
to "long"; turn variables into column names or
column names into variables and more. The tidyr
package is a newer, more focused option, but I still
use reshape2. CRAN.
See my tutorial Hadley Wickham
stringr data wrangling
Numerous functions for text manipulation. Some
are similar to existing base R functions but in a more
standard format, including working with regular
expressions. Some of my favorites: str_pad and
str_trim. CRAN.
str_pad(myzipcodevector, 5, "left", "0") Hadley Wickham
lubridate data wrangling
Everything you ever wanted to do with date
arithmetic, although understanding & using
available functionality can be somewhat complex.
CRAN.
mdy("05/06/2015") + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham &
others
sqldf
data wrangling,
data analysis
Do you know a great SQL query you'd use if your R
data frame were in a SQL database? Run SQL
queries on your data frame with sqldf. CRAN.
sqldf("select * from mydf where mycol >
4")
G. Grothendieck
dplyr
data wrangling,
data analysis
The essential data-munging R package when
working with data frames. Especially useful for
operating on data by categories. CRAN.
See the intro vignette Hadley Wickham
data.table
data wrangling,
data analysis
Popular package for heavy-duty data wrangling.
While I typically prefer dplyr, data.table has many
fans for its speed with large data sets. CRAN.
Useful tutorial Matt Dowle & others
zoo
data wrangling,
data analysis
Robust package with a slew of functions for
dealing with time series data; I like the handy
rollmean function for calculating moving averages.
CRAN.
rollmean(mydf, 7) Achim Zeileis & others
http://www.computerworld.com/article/2921176/business-intelligence/great-r-packages-for-data-import-wrangling-visualization.html
データ全体の処理
データ要素の処理
16. ddply {plyr}
> library(plyr)
Warning message:
パッケージ ‘plyr’ はバージョン 3.1.3 の R の下で造られました
> df <- data.frame(
+ group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
+ sex = sample(c("M", "F"), size = 29, replace = TRUE),
+ age = runif(n = 29, min = 18, max = 54)
+ )
> ddply(df, .(group, sex), summarize,
+ mean = mean(age),
+ sd = sd(age))
Error in withCallingHandlers(tryCatch(evalq((function (i) :
object '.rcpp_warning_recorder' not found
R3.1.1以降でエラー?
17. ddply {plyr}
install.packages("plyr", type = "source")
library(plyr)
> ddply(df, .(group, sex), summarize,
+ mean = mean(age),
+ sd = sd(age))
group sex mean sd
1 A F 42.43033 8.996826
2 A M 30.09450 13.311536
3 B F 35.64277 11.060713
4 B M 38.96056 6.731923
5 C F 25.01813 4.588658
6 C M 49.29878 NA
> head(df)
group sex age
1 A M 20.23535
2 A F 34.10908
3 A M 45.23656
4 A F 52.72067
5 A M 24.81160
6 A F 37.51441
18. ddply {plyr}
install.packages("plyr", type = "source")
library(plyr)
> ddply(df, .(group, sex), summarize,
+ mean = mean(age),
+ sd = sd(age))
group sex mean sd
1 A F 42.43033 8.996826
2 A M 30.09450 13.311536
3 B F 35.64277 11.060713
4 B M 38.96056 6.731923
5 C F 25.01813 4.588658
6 C M 49.29878 NA
> head(df)
group sex age
1 A M 20.23535
2 A F 34.10908
3 A M 45.23656
4 A F 52.72067
5 A M 24.81160
6 A F 37.51441
{dplyr}を使った場合
> df %>% group_by(sex) %>% summarise(mean=mean(age), sd=sd(age))
Source: local data frame [2 x 3]
sex mean sd
1 F 34.51422 10.940603
2 M 37.60556 9.497813
29. Data Wrangling用パッケージ群
パッケージ 用途 コメント 解説 作者
plyr data wrangling
While dplyr is my go-to package for wrangling data
frames, the older plyr package still comes in handy
when working with other types of R data such as
lists. CRAN.
llply(mylist, myfunction) Hadley Wickham
reshape2 data wrangling
Change data row and column formats from "wide"
to "long"; turn variables into column names or
column names into variables and more. The tidyr
package is a newer, more focused option, but I still
use reshape2. CRAN.
See my tutorial Hadley Wickham
stringr data wrangling
Numerous functions for text manipulation. Some
are similar to existing base R functions but in a more
standard format, including working with regular
expressions. Some of my favorites: str_pad and
str_trim. CRAN.
str_pad(myzipcodevector, 5, "left", "0") Hadley Wickham
lubridate data wrangling
Everything you ever wanted to do with date
arithmetic, although understanding & using
available functionality can be somewhat complex.
CRAN.
mdy("05/06/2015") + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham &
others
sqldf
data wrangling,
data analysis
Do you know a great SQL query you'd use if your R
data frame were in a SQL database? Run SQL
queries on your data frame with sqldf. CRAN.
sqldf("select * from mydf where mycol >
4")
G. Grothendieck
dplyr
data wrangling,
data analysis
The essential data-munging R package when
working with data frames. Especially useful for
operating on data by categories. CRAN.
See the intro vignette Hadley Wickham
data.table
data wrangling,
data analysis
Popular package for heavy-duty data wrangling.
While I typically prefer dplyr, data.table has many
fans for its speed with large data sets. CRAN.
Useful tutorial Matt Dowle & others
zoo
data wrangling,
data analysis
Robust package with a slew of functions for
dealing with time series data; I like the handy
rollmean function for calculating moving averages.
CRAN.
rollmean(mydf, 7) Achim Zeileis & others
http://www.computerworld.com/article/2921176/business-intelligence/great-r-packages-for-data-import-wrangling-visualization.html
時間の操作をする際のデータの型変更
・ as.Date: 日付だけで十分な場合
・ as.POSIXct:日時を扱いたい場合
・ as.POSIXlt: 時間、分、秒等各要素を取り出したい場合
・ as.integer: (規則・不規則)時系列データに関する
処理を行う必要がある場合
・ as.ts: 時系列関数を利用する場合
・ as.zoo, as.xts:時系列処理用パッケージを利用する場合
個人的な使い分け
なのですが…
もっといい方法お
しえてください
30. Data Wrangling用パッケージ群
パッケージ 用途 コメント 解説 作者
plyr data wrangling
While dplyr is my go-to package for wrangling data
frames, the older plyr package still comes in handy
when working with other types of R data such as
lists. CRAN.
llply(mylist, myfunction) Hadley Wickham
reshape2 data wrangling
Change data row and column formats from "wide"
to "long"; turn variables into column names or
column names into variables and more. The tidyr
package is a newer, more focused option, but I still
use reshape2. CRAN.
See my tutorial Hadley Wickham
stringr data wrangling
Numerous functions for text manipulation. Some
are similar to existing base R functions but in a more
standard format, including working with regular
expressions. Some of my favorites: str_pad and
str_trim. CRAN.
str_pad(myzipcodevector, 5, "left", "0") Hadley Wickham
lubridate data wrangling
Everything you ever wanted to do with date
arithmetic, although understanding & using
available functionality can be somewhat complex.
CRAN.
mdy("05/06/2015") + months(1)
More examples in the package vignette
Garrett Grolemund, Hadley Wickham &
others
sqldf
data wrangling,
data analysis
Do you know a great SQL query you'd use if your R
data frame were in a SQL database? Run SQL
queries on your data frame with sqldf. CRAN.
sqldf("select * from mydf where mycol >
4")
G. Grothendieck
dplyr
data wrangling,
data analysis
The essential data-munging R package when
working with data frames. Especially useful for
operating on data by categories. CRAN.
See the intro vignette Hadley Wickham
data.table
data wrangling,
data analysis
Popular package for heavy-duty data wrangling.
While I typically prefer dplyr, data.table has many
fans for its speed with large data sets. CRAN.
Useful tutorial Matt Dowle & others
zoo
data wrangling,
data analysis
Robust package with a slew of functions for
dealing with time series data; I like the handy
rollmean function for calculating moving averages.
CRAN.
rollmean(mydf, 7) Achim Zeileis & others
http://www.computerworld.com/article/2921176/business-intelligence/great-r-packages-for-data-import-wrangling-visualization.html
時間の操作をする際のデータの型変更
・ as.Date: 日付だけで十分な場合
・ as.POSIXct:日時を扱いたい場合
・ as.POSIXlt: 時間、分、秒等各要素を取り出したい場合
・ as.integer: (規則・不規則)時系列データに関する
処理を行う必要がある場合
・ as.ts: 時系列関数を利用する場合
・ as.zoo, as.xts:時系列処理用パッケージを利用する場合
補間
approx
approxfun
個人的な使い分け
なのですが…
もっといい方法お
しえてください