SlideShare une entreprise Scribd logo
1  sur  18
Merge Multiple files into single
dataframe using R
Yogesh Khandelwal
Problem Description
• The zip file contains 332 comma-separated-value (CSV) files
containing pollution monitoring data for fine particulate
matter (PM) air pollution at 332 locations in the United States.
Each file contains data from a single monitor and the ID
number for each monitor is contained in the file name. For
example, data for monitor 200 is contained in the file
"200.csv".
• Data Source: http://spark-
public.s3.amazonaws.com/compdata/data/specdata.zip
Variable Name
Variables in file
• Date: the date of observation in YYYY-MM-DD format
(year-month-day) ,Datatype:factor
• sulfate: the level of sulfate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• nitrate: the level of nitrate PM in the air on that date
(measured in micrograms per cubic
meter),Datatype:num
• Id:location id,Datatype:int
Before we start we should know
• Functions in R
• How to merge data files
Functions in R
Functions in R
Functions are created using the function() directive and are
stored as R objects just like anything else. In particular, they are R
objects of class “function”.
f <- function(<arguments>) {
## Do something interesting
}
• Functions in R are “first class objects”, which means that they can
be treated much like any other R object. Importantly,
• Functions can be passed as arguments to other functions.
• Functions can be nested, so that you can define a function
inside of another function
• The return value of a function is the last expression in the function
• body to be evaluated.
Function contd..
• For ex:
Function name
Function defination
Function call
Our objective
• How we can merge no. of files into single data
frame?
• How to apply same function to different files
in efficient way?
How to merge two different files?
• No.of options available like
1. Use merge() function
2. Use rbind(),cbind() etc.
How to merge no.of files as a single
data frame
• Approach 1
files<-list.files("specdata",full.names = TRUE)
dat<-NULL
for(i in 1:332)
{
dat<-rbind(dat,read.csv(files[i]))
}
• Further we can run various command on merged file object as per our need some are like:
1. Str(dat)
2. Head(dat)
3. Tail(dat) etc.
Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE,
the file names (rather than paths) are returned.
How to handle missing value in R ?
contd.
• In R, NA is used to represent any value that is 'not available' or 'missing' (in
the | statistical sense)
• Missing values play an important role in statistics and data analysis. Often,
missing values must not be ignored, but rather they should be carefully
studied to see if there's an underlying pattern or cause for their
missingness.
• For ex:
• X<-c(1,2,NA,4)
• Y<-c(NA,2,3,1)
• >x+y
• [1] NA 4 NA 5
• Multiple options are available in R to handle NA values like
• Is.NA()
• Set na.rm=TRUE as a function argument
> mean(X) [1] NA
> mean(X,na.rm = TRUE) [1] 2.333333
Apply what we learn to our dataset
Function defination
Function call
pollutantmean('specdata','nitrate',1:10)
[1] 0.7976266
Thank You!!

Contenu connexe

Tendances

Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methodsguest9fa52
 
Factor analysis
Factor analysis Factor analysis
Factor analysis Nima
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear modelRahul Rockers
 
STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution jundumaug1
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataRoqui Malijan
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis pptAnil Mishra
 
3-Measures of Central Tendency for Ungrouped data 2.pptx
3-Measures of Central Tendency for Ungrouped data 2.pptx3-Measures of Central Tendency for Ungrouped data 2.pptx
3-Measures of Central Tendency for Ungrouped data 2.pptxssuserdb3083
 
Levels of Measurement
Levels of MeasurementLevels of Measurement
Levels of MeasurementSarfraz Ahmad
 
Operation research (definition, phases)
Operation research (definition, phases)Operation research (definition, phases)
Operation research (definition, phases)DivyaKS12
 
Chi squared test
Chi squared testChi squared test
Chi squared testDhruv Patel
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionSachin Shekde
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysisMurali Raj
 
Applications of mean ,mode & median
Applications of mean ,mode & medianApplications of mean ,mode & median
Applications of mean ,mode & medianAnagha Deshpande
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdfYashwanth Rm
 

Tendances (20)

Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methods
 
Factor analysis (1)
Factor analysis (1)Factor analysis (1)
Factor analysis (1)
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Advanced statistics
Advanced statisticsAdvanced statistics
Advanced statistics
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
 
STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution
 
Data Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of DataData Analysis, Presentation and Interpretation of Data
Data Analysis, Presentation and Interpretation of Data
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
T distribution
T distributionT distribution
T distribution
 
3-Measures of Central Tendency for Ungrouped data 2.pptx
3-Measures of Central Tendency for Ungrouped data 2.pptx3-Measures of Central Tendency for Ungrouped data 2.pptx
3-Measures of Central Tendency for Ungrouped data 2.pptx
 
Levels of Measurement
Levels of MeasurementLevels of Measurement
Levels of Measurement
 
Multivariate
MultivariateMultivariate
Multivariate
 
Operation research (definition, phases)
Operation research (definition, phases)Operation research (definition, phases)
Operation research (definition, phases)
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
MODE.pptx
MODE.pptxMODE.pptx
MODE.pptx
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Applications of mean ,mode & median
Applications of mean ,mode & medianApplications of mean ,mode & median
Applications of mean ,mode & median
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdf
 

Similaire à Merge Multiple CSV in single data frame using R

La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoData Con LA
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 
Mba admission in india
Mba admission in indiaMba admission in india
Mba admission in indiaEdhole.com
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersSriTeja Allaparthi
 
Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)Prof Ansari
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentAdetula Bunmi
 
Authoring Tool of AAT with DADT
Authoring Tool of AAT with DADTAuthoring Tool of AAT with DADT
Authoring Tool of AAT with DADTAAT Taiwan
 
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questionssubhashmr
 
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questionsUmesh Sawant
 
Searching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile RelationshipsSearching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile RelationshipsTakashi Kobayashi
 
C++ unit-1-part-6
C++ unit-1-part-6C++ unit-1-part-6
C++ unit-1-part-6Jadavsejal
 

Similaire à Merge Multiple CSV in single data frame using R (20)

La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
Digital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS dataDigital Object Identifiers for EOSDIS data
Digital Object Identifiers for EOSDIS data
 
Mba admission in india
Mba admission in indiaMba admission in india
Mba admission in india
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Data Life Cycle
Data Life CycleData Life Cycle
Data Life Cycle
 
Basics R.ppt
Basics R.pptBasics R.ppt
Basics R.ppt
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
 
I explore
I exploreI explore
I explore
 
Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)Normalisation in Database management System (DBMS)
Normalisation in Database management System (DBMS)
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
 
Authoring Tool of AAT with DADT
Authoring Tool of AAT with DADTAuthoring Tool of AAT with DADT
Authoring Tool of AAT with DADT
 
Basics.ppt
Basics.pptBasics.ppt
Basics.ppt
 
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questions
 
Active directory interview_questions
Active directory interview_questionsActive directory interview_questions
Active directory interview_questions
 
Searching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile RelationshipsSearching Keyword-lacking Files based on Latent Interfile Relationships
Searching Keyword-lacking Files based on Latent Interfile Relationships
 
File handling
File handlingFile handling
File handling
 
C++ unit-1-part-6
C++ unit-1-part-6C++ unit-1-part-6
C++ unit-1-part-6
 

Dernier

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Dernier (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Merge Multiple CSV in single data frame using R

  • 1. Merge Multiple files into single dataframe using R Yogesh Khandelwal
  • 2. Problem Description • The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data for fine particulate matter (PM) air pollution at 332 locations in the United States. Each file contains data from a single monitor and the ID number for each monitor is contained in the file name. For example, data for monitor 200 is contained in the file "200.csv". • Data Source: http://spark- public.s3.amazonaws.com/compdata/data/specdata.zip
  • 3.
  • 5. Variables in file • Date: the date of observation in YYYY-MM-DD format (year-month-day) ,Datatype:factor • sulfate: the level of sulfate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • nitrate: the level of nitrate PM in the air on that date (measured in micrograms per cubic meter),Datatype:num • Id:location id,Datatype:int
  • 6. Before we start we should know • Functions in R • How to merge data files
  • 8. Functions in R Functions are created using the function() directive and are stored as R objects just like anything else. In particular, they are R objects of class “function”. f <- function(<arguments>) { ## Do something interesting } • Functions in R are “first class objects”, which means that they can be treated much like any other R object. Importantly, • Functions can be passed as arguments to other functions. • Functions can be nested, so that you can define a function inside of another function • The return value of a function is the last expression in the function • body to be evaluated.
  • 9. Function contd.. • For ex: Function name Function defination Function call
  • 10. Our objective • How we can merge no. of files into single data frame? • How to apply same function to different files in efficient way?
  • 11. How to merge two different files?
  • 12. • No.of options available like 1. Use merge() function 2. Use rbind(),cbind() etc.
  • 13. How to merge no.of files as a single data frame • Approach 1 files<-list.files("specdata",full.names = TRUE) dat<-NULL for(i in 1:332) { dat<-rbind(dat,read.csv(files[i])) } • Further we can run various command on merged file object as per our need some are like: 1. Str(dat) 2. Head(dat) 3. Tail(dat) etc. Notes:full.names= a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE, the file names (rather than paths) are returned.
  • 14. How to handle missing value in R ?
  • 15. contd. • In R, NA is used to represent any value that is 'not available' or 'missing' (in the | statistical sense) • Missing values play an important role in statistics and data analysis. Often, missing values must not be ignored, but rather they should be carefully studied to see if there's an underlying pattern or cause for their missingness. • For ex: • X<-c(1,2,NA,4) • Y<-c(NA,2,3,1) • >x+y • [1] NA 4 NA 5 • Multiple options are available in R to handle NA values like • Is.NA() • Set na.rm=TRUE as a function argument > mean(X) [1] NA > mean(X,na.rm = TRUE) [1] 2.333333
  • 16. Apply what we learn to our dataset Function defination

Notes de l'éditeur

  1. lapply() applies a given function for each element in a list,so there will be several function calls. do.call() applies a given function to the list as a whole,so there is only one function call.