SlideShare une entreprise Scribd logo
1  sur  19
R SEMINAR
Antony Karanja N.
Research Methods Group, ICRAF
2nd April, 15
Data Management and Analysis
AIM
• Recap on the steps and tips to R learning to
code
• Introduction to dplyr package
• How to utilize dplyr package for data
manipulation* and basic statistics
• Ultimate: dplyr and ggplot2
RECAP
• Set working directory (creating project, setwd)
• Installing and calling library packages
• Reading/loading data (read.???)
• What is the R object type (class)
• Variables within data frames
• Knowing which Data type are the variables
• View head and tail data
RECAP###################
# IMPORT datasets #
###################
tree<-read.csv(file="datavis.csv",header=T)
#-------------------------
# Inspect data with head()
#-------------------------
names(tree);colnames(tree)
head(tree)
tail(tree)
#-------------------------
# Inspect R object type
#-------------------------
class(tree)
#-------------------------
# Inspect Internal structure of R object type
#-------------------------
str(tree)
glimpse(tree)
#-------------------------
# Inspect data types
#-------------------------
sapply(tree,class) #-horizontal view
lapply(tree,class) #-Vertical view
##############################
# LOOK FOR DUPLICATE RECORDS #
##############################
duplicates<-tree[anyDuplicated(tree[c("Country","Site","PosTopoSeq")]),] #Base function
dplyr
• #install.packages(“dplyr”)
• >library(dplyr)
• Grammar of data manipulations
– filter() (and slice())
– arrange()
– select() (and rename())
– distinct()
– mutate() (and transmute())
– summarise()
– sample_n() and sample_frac()
filter()
• filter() allows you to select a subset of the rows of a
data frame.
• filter() works similarly to subset()
• Filter(FD, condition(s))
#1.0 #### filter - By and (use comma) or use |
table(tree$Country)
Nicaragua<-filter(tree, Country == "Nicaragua")
SA<-filter(tree, Country == "South Africa")
#1.1 #### slice
Nicaragua2<-slice(tree, 1:16)
arrange()
• arrange() works similarly to filter() except that
instead of filtering or selecting rows, it reorders
them.
#2.0 #### arrange
arrange(tree, Site,PosTopoSeq,VegStructure)
tree_arr<-arrange(tree, Site,PosTopoSeq,VegStructure)
tree_arr<-arrange(tree, desc(Site),PosTopoSeq,VegStructure)
select()
• Very helpful when working with dataset with many
columns/variables
• Helper function within select() include starts_with(),
ends_with(), matches() and contains()
#2.0 #### select
tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH,Clay)
tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH>=5,Clay)
#err!!!!
# What is happening here????
tree_select<-select(tree,-c(Site,PosTopoSeq,VegStructure))
tree_select<-select(tree,-(Site:VegStructure))
select()
#2.0.1 select and helper functions
# Keep variables or drop if negative sign (-)
select(tree, starts_with("av",ignore.case=T),starts_with("C"))
select(tree, ends_with("e"))
select(tree, contains("p"))
select(tree, matches("av"))
rename()
• To assign another name to the existing
variable
#2.1 #### rename
tree_rename<-rename(tree,Slope=avSlope)
tree_rename<-rename(tree,Slope=avSlope,TreeDen=avTreeDen)
distinct()
• Extract distinct (unique) rows
#3.0 ### distinct
tree_distinct<-distinct(tree)
tree_distinct<-distinct(select(tree,Country,Site,PosTopoSeq))
mutate()
• add new columns that are functions of
existing columns.
#4.0 ### Mutate
tree_mute<-mutate(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen)
#4.0.1 ### transmute
tree_mute<-transmute(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen)
sample_n()
• use sample_n() and sample_frac() to take a
random sample of rows
#5.0 ### sample_n()
sample_n(tree, 10,replace=F)
#5.0.1 ### sample_frac()
sample_frac(tbl=tree, size=0.1)
summarise()
• Generate stats from the existing columns/variables.
Also generates by stats by grouping variable(s)
summarise(tree,
count = n(),
MeanCarb = mean(Carbon, na.rm = TRUE),
MeanClay = mean(Clay, na.rm = TRUE),
MedPh=median(pH,na.rm=T))
summarise()
• Stats by grouping variable(s)
tree.summary <- tree %>%
group_by(Country,Site,SEVEREERO) %>%
summarise(count = n(),
meanC = mean(Carbon,na.rm=T),
meanClay = mean(Clay,na.rm=T),
sdC=sd(Carbon,na.rm=T),
sdClay=sd(Clay,na.rm=T),
medPh=median(pH,na.rm=T))
R Version
>R.Version()$version.string
OR
>R.version.string
BONUS
Update R
For windows OS
# installing/loading the package:
>if(!require(installr)) { install.packages("installr”)
>require(installr)} #load / install+load installr
# using the package:
>updateR() # this will start the updating process of your R installation.
Note: It will check for newer versions, and if one is available, will guide you
through the decisions you'd need to make.
Exercise
Use data you are working on and;
1. Manipulate using this the functions above
2. Explore more dplyr functions e.g, how to add row-wise,
column-wise e.t.c

Contenu connexe

Tendances

Tendances (20)

Data handling in r
Data handling in rData handling in r
Data handling in r
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in R
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Statistical computing 01
Statistical computing 01Statistical computing 01
Statistical computing 01
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Python for R users
Python for R usersPython for R users
Python for R users
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
Pandas
PandasPandas
Pandas
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Manipulating data with dates
Manipulating data with datesManipulating data with dates
Manipulating data with dates
 

Similaire à R seminar dplyr package

INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
carliotwaycave
 
Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptx
SreeLaya9
 

Similaire à R seminar dplyr package (20)

RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
Pa1 session 3_slides
Pa1 session 3_slidesPa1 session 3_slides
Pa1 session 3_slides
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي   R program د.هديل القفيديمحاضرة برنامج التحليل الكمي   R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي   R program د.هديل القفيديمحاضرة برنامج التحليل الكمي   R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptx
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptx
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
 
R workshop
R workshopR workshop
R workshop
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 

Dernier (20)

SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

R seminar dplyr package

  • 1. R SEMINAR Antony Karanja N. Research Methods Group, ICRAF 2nd April, 15 Data Management and Analysis
  • 2. AIM • Recap on the steps and tips to R learning to code • Introduction to dplyr package • How to utilize dplyr package for data manipulation* and basic statistics • Ultimate: dplyr and ggplot2
  • 3. RECAP • Set working directory (creating project, setwd) • Installing and calling library packages • Reading/loading data (read.???) • What is the R object type (class) • Variables within data frames • Knowing which Data type are the variables • View head and tail data
  • 4. RECAP################### # IMPORT datasets # ################### tree<-read.csv(file="datavis.csv",header=T) #------------------------- # Inspect data with head() #------------------------- names(tree);colnames(tree) head(tree) tail(tree) #------------------------- # Inspect R object type #------------------------- class(tree) #------------------------- # Inspect Internal structure of R object type #------------------------- str(tree) glimpse(tree) #------------------------- # Inspect data types #------------------------- sapply(tree,class) #-horizontal view lapply(tree,class) #-Vertical view ############################## # LOOK FOR DUPLICATE RECORDS # ############################## duplicates<-tree[anyDuplicated(tree[c("Country","Site","PosTopoSeq")]),] #Base function
  • 5. dplyr • #install.packages(“dplyr”) • >library(dplyr) • Grammar of data manipulations – filter() (and slice()) – arrange() – select() (and rename()) – distinct() – mutate() (and transmute()) – summarise() – sample_n() and sample_frac()
  • 6. filter() • filter() allows you to select a subset of the rows of a data frame. • filter() works similarly to subset() • Filter(FD, condition(s)) #1.0 #### filter - By and (use comma) or use | table(tree$Country) Nicaragua<-filter(tree, Country == "Nicaragua") SA<-filter(tree, Country == "South Africa") #1.1 #### slice Nicaragua2<-slice(tree, 1:16)
  • 7. arrange() • arrange() works similarly to filter() except that instead of filtering or selecting rows, it reorders them. #2.0 #### arrange arrange(tree, Site,PosTopoSeq,VegStructure) tree_arr<-arrange(tree, Site,PosTopoSeq,VegStructure) tree_arr<-arrange(tree, desc(Site),PosTopoSeq,VegStructure)
  • 8. select() • Very helpful when working with dataset with many columns/variables • Helper function within select() include starts_with(), ends_with(), matches() and contains() #2.0 #### select tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH,Clay) tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH>=5,Clay) #err!!!! # What is happening here???? tree_select<-select(tree,-c(Site,PosTopoSeq,VegStructure)) tree_select<-select(tree,-(Site:VegStructure))
  • 9. select() #2.0.1 select and helper functions # Keep variables or drop if negative sign (-) select(tree, starts_with("av",ignore.case=T),starts_with("C")) select(tree, ends_with("e")) select(tree, contains("p")) select(tree, matches("av"))
  • 10. rename() • To assign another name to the existing variable #2.1 #### rename tree_rename<-rename(tree,Slope=avSlope) tree_rename<-rename(tree,Slope=avSlope,TreeDen=avTreeDen)
  • 11. distinct() • Extract distinct (unique) rows #3.0 ### distinct tree_distinct<-distinct(tree) tree_distinct<-distinct(select(tree,Country,Site,PosTopoSeq))
  • 12. mutate() • add new columns that are functions of existing columns. #4.0 ### Mutate tree_mute<-mutate(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen) #4.0.1 ### transmute tree_mute<-transmute(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen)
  • 13. sample_n() • use sample_n() and sample_frac() to take a random sample of rows #5.0 ### sample_n() sample_n(tree, 10,replace=F) #5.0.1 ### sample_frac() sample_frac(tbl=tree, size=0.1)
  • 14. summarise() • Generate stats from the existing columns/variables. Also generates by stats by grouping variable(s) summarise(tree, count = n(), MeanCarb = mean(Carbon, na.rm = TRUE), MeanClay = mean(Clay, na.rm = TRUE), MedPh=median(pH,na.rm=T))
  • 15. summarise() • Stats by grouping variable(s) tree.summary <- tree %>% group_by(Country,Site,SEVEREERO) %>% summarise(count = n(), meanC = mean(Carbon,na.rm=T), meanClay = mean(Clay,na.rm=T), sdC=sd(Carbon,na.rm=T), sdClay=sd(Clay,na.rm=T), medPh=median(pH,na.rm=T))
  • 16.
  • 18. Update R For windows OS # installing/loading the package: >if(!require(installr)) { install.packages("installr”) >require(installr)} #load / install+load installr # using the package: >updateR() # this will start the updating process of your R installation. Note: It will check for newer versions, and if one is available, will guide you through the decisions you'd need to make.
  • 19. Exercise Use data you are working on and; 1. Manipulate using this the functions above 2. Explore more dplyr functions e.g, how to add row-wise, column-wise e.t.c