SlideShare une entreprise Scribd logo
1  sur  2
Télécharger pour lire hors ligne
Other types of data
Try one of the following packages to import
other types of files
• haven - SPSS, Stata, and SAS files
• readxl - excel files (.xls and .xlsx)
• DBI - databases
• jsonlite - json
• xml2 - XML
• httr - Web APIs
• rvest - HTML (Web Scraping)
write_csv(x, path, na = "NA", append = FALSE,
col_names = !append)
Tibble/df to comma delimited file.
write_delim(x, path, delim = " ", na = "NA",
append = FALSE, col_names = !append)
Tibble/df to file with any delimiter.
write_excel_csv(x, path, na = "NA", append =
FALSE, col_names = !append)
Tibble/df to a CSV for excel
write_file(x, path, append = FALSE)
String to file.
write_lines(x, path, na = "NA", append =
FALSE)
String vector to file, one element per line.
write_rds(x, path, compress = c("none", "gz",
"bz2", "xz"), ...)
Object to RDS file.
write_tsv(x, path, na = "NA", append = FALSE,
col_names = !append)
Tibble/df to tab delimited files.
Write functions
Save x, an R object, to path, a file path, with:
Read functions Parsing data types
Tidy Data
with tidyr Cheat Sheet
R’s tidyverse is built around tidy data stored in
tibbles, an enhanced version of a data frame.
The front side of this sheet shows how
to read text files into R with readr.
The reverse side shows how to create
tibbles with tibble and to layout tidy
data with tidyr.
Data Importwith readr, tibble, and tidyr
Cheat Sheet
## Parsed with column specification:
## cols(
## age = col_integer(),
## sex = col_character(),
## earn = col_double()
## )
read_file(file, locale = default_locale())
Read a file into a single string.
read_file_raw(file)
Read a file into a raw vector.
read_lines(file, skip = 0, n_max = -1L, locale =
default_locale(), na = character(), progress =
interactive())
Read each line into its own string.
Skip lines
read_csv("file.csv",
skip = 1)
Read in a subset
read_csv("file.csv",
n_max = 1)
Missing Values
read_csv("file.csv",
na = c("4", "5", "."))
1. Use problems() to diagnose problems
x <- read_csv("file.csv"); problems(x)
2. Use a col_ function to guide parsing
• col_guess() - the default
• col_character()
• col_double()
• col_euro_double()
• col_datetime(format = "") Also
col_date(format = "") and col_time(format = "")
• col_factor(levels, ordered = FALSE)
• col_integer()
• col_logical()
• col_number()
• col_numeric()
• col_skip()
x <- read_csv("file.csv", col_types = cols(
A = col_double(),
B = col_logical(),
C = col_factor()
))
3. Else, read in as character vectors then parse
with a parse_ function.
• parse_guess(x, na = c("", "NA"), locale =
default_locale())
• parse_character(x, na = c("", "NA"), locale =
default_locale())
• parse_datetime(x, format = "", na = c("", "NA"),
locale = default_locale()) Also parse_date()
and parse_time()
• parse_double(x, na = c("", "NA"), locale =
default_locale())
• parse_factor(x, levels, ordered = FALSE, na =
c("", "NA"), locale = default_locale())
• parse_integer(x, na = c("", "NA"), locale =
default_locale())
• parse_logical(x, na = c("", "NA"), locale =
default_locale())
• parse_number(x, na = c("", "NA"), locale =
default_locale())
x$A <- parse_number(x$A)
read_lines_raw(file, skip = 0, n_max = -1L,
progress = interactive())
Read each line into a raw vector.
read_log(file, col_names = FALSE, col_types =
NULL, skip = 0, n_max = -1, progress =
interactive())
Apache style log files.
Read non-tabular data
Read tabular data to tibbles
read_csv()
Reads comma delimited files.
read_csv("file.csv")
read_csv2()
Reads Semi-colon delimited files.
read_csv2("file2.csv")
read_delim(delim, quote = """, escape_backslash = FALSE,
escape_double = TRUE) Reads files with any delimiter.
read_delim("file.txt", delim = "|")
read_fwf(col_positions)
Reads fixed width files.
read_fwf("file.fwf", col_positions = c(1, 3, 5))
read_tsv()
Reads tab delimited files. Also read_table().
read_tsv("file.tsv")
a,b,c
1,2,3
4,5,NA
a;b;c
1;2;3
4;5;NA
a|b|c
1|2|3
4|5|NA
a b c
1 2 3
4 5 NA
These functions share the common arguments:
read_*(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"),
quoted_na = TRUE, comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max =
min(1000, n_max), progress = interactive())
A B C
1 2 3
A B C
1 2 3
4 5 NA
x y z
A B C
1 2 3
4 5 NA
A B C
1 2 3
NA NA NA
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
Useful arguments
a,b,c
1,2,3
4,5,NA
Example file
write_csv (path = "file.csv",
x = read_csv("a,b,cn1,2,3n4,5,NA"))
No header
read_csv("file.csv",
col_names = FALSE)
Provide header
read_csv("file.csv",
col_names = c("x", "y", "z"))
readr functions guess the types of each column
and convert types when appropriate (but will
NOT convert strings to factors automatically).
A message shows the type of each column in
the result.
earn is a double (numeric)
sex is a
character
age is an
integer
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
gather(data, key, value, ..., na.rm = FALSE,
convert = FALSE, factor_key = FALSE)
Gather moves column names into a key
column, gathering the column values into a
single value column.
spread(data, key, value, fill = NA, convert = FALSE,
drop = TRUE, sep = NULL)
Spread moves the unique values of a key column
into the column names, spreading the values of a
value column across the new columns that result.
Use gather() and spread() to reorganize the values of a table into a new layout. Each uses the idea of a
key column: value column pair.
gather(table4a, `1999`, `2000`,
key = "year", value = "cases") spread(table2, type, count)
valuekey
country 1999 2000
A 0.7K 2K
B 37K 80K
C 212K 213K
table4a
country year cases
A 1999 0.7K
B 1999 37K
C 1999 212K
A 2000 2K
B 2000 80K
C 2000 213K
valuekey
country year cases pop
A 1999 0.7K 19M
A 2000 2K 20M
B 1999 37K 172M
B 2000 80K 174M
C 1999 212K 1T
C 2000 213K 1T
table2
country year type count
A 1999 cases 0.7K
A 1999 pop 19M
A 2000 cases 2K
A 2000 pop 20M
B 1999 cases 37K
B 1999 pop 172M
B 2000 cases 80K
B 2000 pop 174M
C 1999 cases 212K
C 1999 pop 1T
C 2000 cases 213K
C 2000 pop 1T
Split and Combine Cells
unite(data, col, ..., sep = "_", remove = TRUE)
Collapse cells across several columns to
make a single column.
drop_na(data, ...)
Drop rows containing
NA’s in … columns.
fill(data, ..., .direction = c("down", "up"))
Fill in NA’s in … columns with most
recent non-NA values.
replace_na(data,
replace = list(), ...)
Replace NA’s by column.
Use these functions to split or combine cells into
individual, isolated values.
country year rate
A 1999 0.7K/19M
A 2000 2K/20M
B 1999 37K/172M
B 2000 80K/174M
C 1999 212K/1T
C 2000 213K/1T
country year cases pop
A 1999 0.7K 19M
A 2000 2K 20M
B 1999 37K 172
B 2000 80K 174
C 1999 212K 1T
C 2000 213K 1T
table3
separate(data, col, into, sep = "[^[:alnum:]]+",
remove = TRUE, convert = FALSE,
extra = "warn", fill = "warn", ...)
Separate each cell in a column to make several
columns.
separate_rows(data, ..., sep = "[^[:alnum:].]+",
convert = FALSE)
Separate each cell in a column to make several
rows. Also separate_rows_().
country century year
Afghan 19 99
Afghan 20 0
Brazil 19 99
Brazil 20 0
China 19 99
China 20 0
country year
Afghan 1999
Afghan 2000
Brazil 1999
Brazil 2000
China 1999
China 2000
table5
separate_rows(table3, rate,
into = c("cases", "pop"))
separate_rows(table3, rate)
unite(table5, century, year,
col = "year", sep = "")
Tidy Data with tidyr
x1 x2
A 1
B NA
C NA
D 3
E NA
x1 x2
A 1
D 3
x
x1 x2
A 1
B NA
C NA
D 3
E NA
x1 x2
A 1
B 1
C 1
D 3
E 3
x
x1 x2
A 1
B NA
C NA
D 3
E NA
x1 x2
A 1
B 2
C 2
D 3
E 2
x
drop_na(x, x2) fill(x, x2) replace_na(x,list(x2 = 2), x2)
country year rate
A 1999 0.7K
A 1999 19M
A 2000 2K
A 2000 20M
B 1999 37K
B 1999 172M
B 2000 80K
B 2000 174M
C 1999 212K
C 1999 1T
C 2000 213K
C 2000 1T
table3
country year rate
A 1999 0.7K/19M
A 2000 2K/20M
B 1999 37K/172M
B 2000 80K/174M
C 1999 212K/1T
C 2000 213K/1T
• Control the default appearance with options:
options(tibble.print_max = n,
tibble.print_min = m, tibble.width = Inf)
• View entire data set with View(x, title) or
glimpse(x, width = NULL, …)
• Revert to data frame with as.data.frame()
(required for some older packages)
Tibbles - an enhanced data frame
Handle Missing Values
Reshape Data - change the layout of values in a table
Tidy data is a way to organize tabular data. It provides a consistent data structure across packages.
CBA
A * B -> C
*A B C
Each observation, or
case, is in its own row
A B C
Each variable is in
its own column
A B C
&
Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
The tibble package provides a new S3 class for
storing tabular data, the tibble. Tibbles inherit the
data frame class, but improve two behaviors:
• Display - When you print a tibble, R provides a
concise view of the data that fits on one screen.
• Subsetting - [ always returns a new tibble,
[[ and $ always return a vector.
• No partial matching - You must use full
column names when subsetting
as_tibble(x, …) Convert data frame to tibble.
enframe(x, name = "name", value = "value")
Converts named vector to a tibble with a
names column and a values column.
is_tibble(x) Test whether x is a tibble.
data frame display
tibble display
Construct a tibble in two ways
tibble(…)
Construct by columns.
tibble(x = 1:3,
y = c("a", "b", "c"))
tribble(…)
Construct by rows.
tribble(
~x, ~y,
1, "a",
2, "b",
3, "c")
A tibble: 3 × 2
x y
<int> <dbl>
1 1 a
2 2 b
3 3 c
Both make
this tibble
ww
# A tibble: 234 × 6
manufacturer model displ
<chr> <chr> <dbl>
1 audi a4 1.8
2 audi a4 1.8
3 audi a4 2.0
4 audi a4 2.0
5 audi a4 2.8
6 audi a4 2.8
7 audi a4 3.1
8 audi a4 quattro 1.8
9 audi a4 quattro 1.8
10 audi a4 quattro 2.0
# ... with 224 more rows, and 3
# more variables: year <int>,
# cyl <int>, trans <chr>
156 1999 6 auto(l4)
157 1999 6 auto(l4)
158 2008 6 auto(l4)
159 2008 8 auto(s4)
160 1999 4 manual(m5)
161 1999 4 auto(l4)
162 2008 4 manual(m5)
163 2008 4 manual(m5)
164 2008 4 auto(l4)
165 2008 4 auto(l4)
166 1999 4 auto(l4)
[ reached getOption("max.print")
-- omitted 68 rows ]A large table
to display
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com
A table is tidy if: Tidy data:
Makes variables easy
to access as vectors
Preserves cases during
vectorized operations
Expand Tables - quickly create tables with combinations of values
complete(data, ..., fill = list())
Adds to the data missing combinations of the
values of the variables listed in …
complete(mtcars, cyl, gear, carb)
expand(data, ...)
Create new tibble with all possible combinations
of the values of the variables listed in …
expand(mtcars, cyl, gear, carb)

Contenu connexe

Tendances

Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
stasimus
 
R short-refcard
R short-refcardR short-refcard
R short-refcard
conline
 

Tendances (19)

Stata cheat sheet analysis
Stata cheat sheet analysisStata cheat sheet analysis
Stata cheat sheet analysis
 
Scala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsScala. Introduction to FP. Monads
Scala. Introduction to FP. Monads
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
 
Mysql
MysqlMysql
Mysql
 
Mysql
MysqlMysql
Mysql
 
Practical cats
Practical catsPractical cats
Practical cats
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheet
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheet
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Language R
Language RLanguage R
Language R
 
02 arrays
02 arrays02 arrays
02 arrays
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
R reference card
R reference cardR reference card
R reference card
 
R short-refcard
R short-refcardR short-refcard
R short-refcard
 

Similaire à Data import-cheatsheet

Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 

Similaire à Data import-cheatsheet (20)

tidyr.pdf
tidyr.pdftidyr.pdf
tidyr.pdf
 
R language introduction
R language introductionR language introduction
R language introduction
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
R
RR
R
 
R-Excel Integration
R-Excel IntegrationR-Excel Integration
R-Excel Integration
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Data Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetData Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat Sheet
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
R programming
R programmingR programming
R programming
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 

Plus de Dieudonne Nahigombeye (9)

Rstudio ide-cheatsheet
Rstudio ide-cheatsheetRstudio ide-cheatsheet
Rstudio ide-cheatsheet
 
Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0
 
Reg ex cheatsheet
Reg ex cheatsheetReg ex cheatsheet
Reg ex cheatsheet
 
How big-is-your-graph
How big-is-your-graphHow big-is-your-graph
How big-is-your-graph
 
Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1
 
Eurostat cheatsheet
Eurostat cheatsheetEurostat cheatsheet
Eurostat cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Base r
Base rBase r
Base r
 
Advanced r
Advanced rAdvanced r
Advanced r
 

Dernier

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Data import-cheatsheet

  • 1. Other types of data Try one of the following packages to import other types of files • haven - SPSS, Stata, and SAS files • readxl - excel files (.xls and .xlsx) • DBI - databases • jsonlite - json • xml2 - XML • httr - Web APIs • rvest - HTML (Web Scraping) write_csv(x, path, na = "NA", append = FALSE, col_names = !append) Tibble/df to comma delimited file. write_delim(x, path, delim = " ", na = "NA", append = FALSE, col_names = !append) Tibble/df to file with any delimiter. write_excel_csv(x, path, na = "NA", append = FALSE, col_names = !append) Tibble/df to a CSV for excel write_file(x, path, append = FALSE) String to file. write_lines(x, path, na = "NA", append = FALSE) String vector to file, one element per line. write_rds(x, path, compress = c("none", "gz", "bz2", "xz"), ...) Object to RDS file. write_tsv(x, path, na = "NA", append = FALSE, col_names = !append) Tibble/df to tab delimited files. Write functions Save x, an R object, to path, a file path, with: Read functions Parsing data types Tidy Data with tidyr Cheat Sheet R’s tidyverse is built around tidy data stored in tibbles, an enhanced version of a data frame. The front side of this sheet shows how to read text files into R with readr. The reverse side shows how to create tibbles with tibble and to layout tidy data with tidyr. Data Importwith readr, tibble, and tidyr Cheat Sheet ## Parsed with column specification: ## cols( ## age = col_integer(), ## sex = col_character(), ## earn = col_double() ## ) read_file(file, locale = default_locale()) Read a file into a single string. read_file_raw(file) Read a file into a raw vector. read_lines(file, skip = 0, n_max = -1L, locale = default_locale(), na = character(), progress = interactive()) Read each line into its own string. Skip lines read_csv("file.csv", skip = 1) Read in a subset read_csv("file.csv", n_max = 1) Missing Values read_csv("file.csv", na = c("4", "5", ".")) 1. Use problems() to diagnose problems x <- read_csv("file.csv"); problems(x) 2. Use a col_ function to guide parsing • col_guess() - the default • col_character() • col_double() • col_euro_double() • col_datetime(format = "") Also col_date(format = "") and col_time(format = "") • col_factor(levels, ordered = FALSE) • col_integer() • col_logical() • col_number() • col_numeric() • col_skip() x <- read_csv("file.csv", col_types = cols( A = col_double(), B = col_logical(), C = col_factor() )) 3. Else, read in as character vectors then parse with a parse_ function. • parse_guess(x, na = c("", "NA"), locale = default_locale()) • parse_character(x, na = c("", "NA"), locale = default_locale()) • parse_datetime(x, format = "", na = c("", "NA"), locale = default_locale()) Also parse_date() and parse_time() • parse_double(x, na = c("", "NA"), locale = default_locale()) • parse_factor(x, levels, ordered = FALSE, na = c("", "NA"), locale = default_locale()) • parse_integer(x, na = c("", "NA"), locale = default_locale()) • parse_logical(x, na = c("", "NA"), locale = default_locale()) • parse_number(x, na = c("", "NA"), locale = default_locale()) x$A <- parse_number(x$A) read_lines_raw(file, skip = 0, n_max = -1L, progress = interactive()) Read each line into a raw vector. read_log(file, col_names = FALSE, col_types = NULL, skip = 0, n_max = -1, progress = interactive()) Apache style log files. Read non-tabular data Read tabular data to tibbles read_csv() Reads comma delimited files. read_csv("file.csv") read_csv2() Reads Semi-colon delimited files. read_csv2("file2.csv") read_delim(delim, quote = """, escape_backslash = FALSE, escape_double = TRUE) Reads files with any delimiter. read_delim("file.txt", delim = "|") read_fwf(col_positions) Reads fixed width files. read_fwf("file.fwf", col_positions = c(1, 3, 5)) read_tsv() Reads tab delimited files. Also read_table(). read_tsv("file.tsv") a,b,c 1,2,3 4,5,NA a;b;c 1;2;3 4;5;NA a|b|c 1|2|3 4|5|NA a b c 1 2 3 4 5 NA These functions share the common arguments: read_*(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = interactive()) A B C 1 2 3 A B C 1 2 3 4 5 NA x y z A B C 1 2 3 4 5 NA A B C 1 2 3 NA NA NA 1 2 3 4 5 NA A B C 1 2 3 4 5 NA A B C 1 2 3 4 5 NA A B C 1 2 3 4 5 NA A B C 1 2 3 4 5 NA Useful arguments a,b,c 1,2,3 4,5,NA Example file write_csv (path = "file.csv", x = read_csv("a,b,cn1,2,3n4,5,NA")) No header read_csv("file.csv", col_names = FALSE) Provide header read_csv("file.csv", col_names = c("x", "y", "z")) readr functions guess the types of each column and convert types when appropriate (but will NOT convert strings to factors automatically). A message shows the type of each column in the result. earn is a double (numeric) sex is a character age is an integer RStudio® is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
  • 2. gather(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE) Gather moves column names into a key column, gathering the column values into a single value column. spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL) Spread moves the unique values of a key column into the column names, spreading the values of a value column across the new columns that result. Use gather() and spread() to reorganize the values of a table into a new layout. Each uses the idea of a key column: value column pair. gather(table4a, `1999`, `2000`, key = "year", value = "cases") spread(table2, type, count) valuekey country 1999 2000 A 0.7K 2K B 37K 80K C 212K 213K table4a country year cases A 1999 0.7K B 1999 37K C 1999 212K A 2000 2K B 2000 80K C 2000 213K valuekey country year cases pop A 1999 0.7K 19M A 2000 2K 20M B 1999 37K 172M B 2000 80K 174M C 1999 212K 1T C 2000 213K 1T table2 country year type count A 1999 cases 0.7K A 1999 pop 19M A 2000 cases 2K A 2000 pop 20M B 1999 cases 37K B 1999 pop 172M B 2000 cases 80K B 2000 pop 174M C 1999 cases 212K C 1999 pop 1T C 2000 cases 213K C 2000 pop 1T Split and Combine Cells unite(data, col, ..., sep = "_", remove = TRUE) Collapse cells across several columns to make a single column. drop_na(data, ...) Drop rows containing NA’s in … columns. fill(data, ..., .direction = c("down", "up")) Fill in NA’s in … columns with most recent non-NA values. replace_na(data, replace = list(), ...) Replace NA’s by column. Use these functions to split or combine cells into individual, isolated values. country year rate A 1999 0.7K/19M A 2000 2K/20M B 1999 37K/172M B 2000 80K/174M C 1999 212K/1T C 2000 213K/1T country year cases pop A 1999 0.7K 19M A 2000 2K 20M B 1999 37K 172 B 2000 80K 174 C 1999 212K 1T C 2000 213K 1T table3 separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) Separate each cell in a column to make several columns. separate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE) Separate each cell in a column to make several rows. Also separate_rows_(). country century year Afghan 19 99 Afghan 20 0 Brazil 19 99 Brazil 20 0 China 19 99 China 20 0 country year Afghan 1999 Afghan 2000 Brazil 1999 Brazil 2000 China 1999 China 2000 table5 separate_rows(table3, rate, into = c("cases", "pop")) separate_rows(table3, rate) unite(table5, century, year, col = "year", sep = "") Tidy Data with tidyr x1 x2 A 1 B NA C NA D 3 E NA x1 x2 A 1 D 3 x x1 x2 A 1 B NA C NA D 3 E NA x1 x2 A 1 B 1 C 1 D 3 E 3 x x1 x2 A 1 B NA C NA D 3 E NA x1 x2 A 1 B 2 C 2 D 3 E 2 x drop_na(x, x2) fill(x, x2) replace_na(x,list(x2 = 2), x2) country year rate A 1999 0.7K A 1999 19M A 2000 2K A 2000 20M B 1999 37K B 1999 172M B 2000 80K B 2000 174M C 1999 212K C 1999 1T C 2000 213K C 2000 1T table3 country year rate A 1999 0.7K/19M A 2000 2K/20M B 1999 37K/172M B 2000 80K/174M C 1999 212K/1T C 2000 213K/1T • Control the default appearance with options: options(tibble.print_max = n, tibble.print_min = m, tibble.width = Inf) • View entire data set with View(x, title) or glimpse(x, width = NULL, …) • Revert to data frame with as.data.frame() (required for some older packages) Tibbles - an enhanced data frame Handle Missing Values Reshape Data - change the layout of values in a table Tidy data is a way to organize tabular data. It provides a consistent data structure across packages. CBA A * B -> C *A B C Each observation, or case, is in its own row A B C Each variable is in its own column A B C & Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01 The tibble package provides a new S3 class for storing tabular data, the tibble. Tibbles inherit the data frame class, but improve two behaviors: • Display - When you print a tibble, R provides a concise view of the data that fits on one screen. • Subsetting - [ always returns a new tibble, [[ and $ always return a vector. • No partial matching - You must use full column names when subsetting as_tibble(x, …) Convert data frame to tibble. enframe(x, name = "name", value = "value") Converts named vector to a tibble with a names column and a values column. is_tibble(x) Test whether x is a tibble. data frame display tibble display Construct a tibble in two ways tibble(…) Construct by columns. tibble(x = 1:3, y = c("a", "b", "c")) tribble(…) Construct by rows. tribble( ~x, ~y, 1, "a", 2, "b", 3, "c") A tibble: 3 × 2 x y <int> <dbl> 1 1 a 2 2 b 3 3 c Both make this tibble ww # A tibble: 234 × 6 manufacturer model displ <chr> <chr> <dbl> 1 audi a4 1.8 2 audi a4 1.8 3 audi a4 2.0 4 audi a4 2.0 5 audi a4 2.8 6 audi a4 2.8 7 audi a4 3.1 8 audi a4 quattro 1.8 9 audi a4 quattro 1.8 10 audi a4 quattro 2.0 # ... with 224 more rows, and 3 # more variables: year <int>, # cyl <int>, trans <chr> 156 1999 6 auto(l4) 157 1999 6 auto(l4) 158 2008 6 auto(l4) 159 2008 8 auto(s4) 160 1999 4 manual(m5) 161 1999 4 auto(l4) 162 2008 4 manual(m5) 163 2008 4 manual(m5) 164 2008 4 auto(l4) 165 2008 4 auto(l4) 166 1999 4 auto(l4) [ reached getOption("max.print") -- omitted 68 rows ]A large table to display RStudio® is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com A table is tidy if: Tidy data: Makes variables easy to access as vectors Preserves cases during vectorized operations Expand Tables - quickly create tables with combinations of values complete(data, ..., fill = list()) Adds to the data missing combinations of the values of the variables listed in … complete(mtcars, cyl, gear, carb) expand(data, ...) Create new tibble with all possible combinations of the values of the variables listed in … expand(mtcars, cyl, gear, carb)