Merge Multiple CSV in single data frame using R

Merge Multiple files into single
dataframe using R
Yogesh Khandelwal

Problem Description
• The zip file contains 332 comma-separated-value (CSV) files
containing pollution monitoring data for fine particulate
matter (PM) air pollution at 332 locations in the United States.
Each file contains data from a single monitor and the ID
number for each monitor is contained in the file name. For
example, data for monitor 200 is contained in the file
"200.csv".
• Data Source: http://spark-
public.s3.amazonaws.com/compdata/data/specdata.zip

Variables in file
• Date: the date of observation in YYYY-MM-DD format
(year-month-day) ,Datatype:factor
• sulfate: the level of sulfate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• nitrate: the level of nitrate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• Id:location id,Datatype:int

Before we start we should know
• Functions in R
• How to merge data files

Functions in R
Functions are created using the function() directive and are
stored as R objects just like anything else. In particular, they are R
objects of class “function”.
f <- function(<arguments>) {
## Do something interesting
}
• Functions in R are “first class objects”, which means that they can
be treated much like any other R object. Importantly,
• Functions can be passed as arguments to other functions.
• Functions can be nested, so that you can define a function
inside of another function
• The return value of a function is the last expression in the function
• body to be evaluated.

Function contd..
• For ex:
Function name
Function defination
Function call

Our objective
• How we can merge no. of files into single data
frame?
• How to apply same function to different files
in efficient way?

How to merge two different files?

• No.of options available like
1. Use merge() function
2. Use rbind(),cbind() etc.

How to merge no.of files as a single
data frame
• Approach 1
files<-list.files("specdata",full.names = TRUE)
dat<-NULL
for(i in 1:332)
{
dat<-rbind(dat,read.csv(files[i]))
}
• Further we can run various command on merged file object as per our need some are like:
1. Str(dat)
2. Head(dat)
3. Tail(dat) etc.
Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE,
the file names (rather than paths) are returned.

How to handle missing value in R ?

contd.
• In R, NA is used to represent any value that is 'not available' or 'missing' (in
the | statistical sense)
• Missing values play an important role in statistics and data analysis. Often,
missing values must not be ignored, but rather they should be carefully
studied to see if there's an underlying pattern or cause for their
missingness.
• For ex:
• X<-c(1,2,NA,4)
• Y<-c(NA,2,3,1)
• >x+y
• [1] NA 4 NA 5
• Multiple options are available in R to handle NA values like
• Is.NA()
• Set na.rm=TRUE as a function argument
> mean(X) [1] NA
> mean(X,na.rm = TRUE) [1] 2.333333

Apply what we learn to our dataset
Function defination

Function call
pollutantmean('specdata','nitrate',1:10)
[1] 0.7976266

Merge Multiple CSV in single data frame using R

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Merge Multiple CSV in single data frame using R

Similaire à Merge Multiple CSV in single data frame using R (20)

Dernier

Dernier (20)

Merge Multiple CSV in single data frame using R

Notes de l'éditeur