SlideShare une entreprise Scribd logo
1  sur  17
Working with directory
• Before writing a program in R important to find
directory to load all list of file in the system
• This can be done by using getwd() without pass
any arguments
• If you want to change directory then setwd(path).
• It help to you to reset the current working
directory to another location
• List.files() helps to you to give information about
your files
• Dir() is equalent to list.files()
Data Exploration in R
• Data Exploration is a statistical approach or
technique for analyzing data sets in order to
summarize their important and main
characteristics generally by using some visual
aids. The EDA approach can be used to gather
knowledge about the following aspects of
data:
• Main characteristics or features of the data.
• The variables and their relationships.
• Finding out the important variables that can
be used in our problem.
• EDA is an iterative approach that includes:
• Generating questions about our data
• Searching for the answers by using visualization,
transformation, and modeling of our data.
• Using the lessons that we learn in order to refine our set of
questions or to generate a new set of questions.
• Exploratory Data Analysis in R
• In R Language, we are going to perform EDA under two
broad classifications:
• Descriptive Statistics, which includes mean, median, mode,
inter-quartile range, and so on.
• Graphical Methods, which includes histogram, density
estimation, box plots, and so on.
• Summary()
• It includes functions like min,Max,median,mean…
• Str()
• Displays the internal structure of dataset
• View()
• Displays the given dataset in separate spread sheet
• Head()
• Displays first 6 rows of data
• Tail()
• Displays last 6 rows of data
• Ncol()
• It returns the number of columns in the data set
• Nrows()
• It returns the number of rows in the data set
• Edit()
• It is used to dynamic editing or data manipulation of
dataset
• Fix()
• It is used to saves the changes in the dataset itself
• Data()
• List out the available data sets
• Image()
• Save.image() writes the external representation of R
objects to the specific file
• dim(iris)// Dimentions
• names(iris)// The attributes
• str(iris) // Structure is revealed
• attributes(iris)//The names, class etc
• iris[1:5] // the first 5
• Head(iris)//first six
. tail(iris)// Last Six entries
• idx<-sample(1:nrow(iris),5) 5 random values from the dataset
• Iris[1:10,”Sepal.Length”]//10 values
• Iris(idx)
• Summary(iris)
• Quantile(iris$Sepal.Length)//% disrtibution
• Quantile(iris$Sepal.Length,c(0.1,0.3,0.65))
• Var(iris$Sepal.Length
• Plot(iris)
Commands for Data Exploration
1) Loading Example Data
2) Example 1: Print First Six Rows of Data Frame Using head() Function
3) Example 2: Return Column Names of Data Frame Using names()
Function
4) Example 3: Get Number of Rows & Columns of Data Frame Using
dim() Function
5) Example 4: Explore Structure of Data Frame Columns Using str()
Function
6) Example 5: Calculate Descriptive Statistics Using summary() Function
7) Example 6: Count NA Values by Column Using colSums() & is.na()
Functions
8) Example 7: Draw Pairs Plot of Data Frame Columns Using ggpairs()
Function of GGally Package
9) Example 8: Draw Boxplots of Multiple Columns Using ggplot2 Package
10) Example 9: Draw facet_wrap Histograms of Multiple Columns Using
ggplot2 Package
Loading Example Data
• we’ll need to load some example data. In this
tutorial, we’ll use the mtcars data set, which
contains information about motor trend car
road tests.
• We can import the mtcars data set to the
current R session using the data() function as
shown below:
• data(mtcars) # Import example data frame
Count NA Values by Column Using
colSums() & is.na() Functions
• The following R programming syntax
demonstrates how to count the number of NA
values in each column of a data frame.
• To do this, we can apply
the colSums and is.na functions:
• colSums(is.na(mtcars)) # Count missing values
Draw Pairs Plot of Data Frame Columns Using ggpairs()
Function of GGally Package
• Until now, we have performed an analytical exploratory data analysis
based on numbers and certain RStudio console outputs.
•
However, when it comes to data exploration, it is also important to
have a visual look at your data.
• The following R code demonstrates how to create a pairs plot using the
.
• For this, we need the functions of the ggplot2 and GGally packages.
• By installing and loading GGally, the ggplot2 package is also imported.
So it’s enough to install and load GGally:
• install.packages("GGally") # Install GGally package library("GGally") #
Load GGally package
• Next, we can apply the ggpairs function of the GGally package to our
data frame:
• ggpairs(mtcars) # Draw pairs plot
Draw Boxplots of Multiple Columns
Using ggplot2 Package
• Boxplots are another popular way to visualize the columns of data
sets.
• To draw such a graph, we first have to manipulate our data using
the tidyr package. In order to use the functions of the tidyr package,
we first need to install and load tidyr to RStudio:
• install.packages("tidyr") # Install & load tidyr library("tidyr")
• Next, we can apply the pivot_longer function to reshape some of the
columns of our data from wide to long format:
• mtcars_long <- pivot_longer(mtcars, # Reshape data frame c("mpg",
"disp", "hp", "qsec"))
• Finally, we can apply the ggplot and geom_boxplot functions to our
data to visualize each of the selected columns in a side-by-side boxplot
graphic:
• gplot(mtcars_long, # Draw boxplots
• aes(x = value, fill = name)) + geom_boxplot()
Draw facet_wrap Histograms of
Multiple Columns Using ggplot2
Package
• Typically, we would also have a look at our
numerical columns in a histogram plot.
• In the following R syntax, I’m creating a histogram
for each of our columns. Furthermore, I’m using
the facet_wrap function to separate each column
in its own plotting panel:
• ggplot(mtcars_long, # Draw histograms aes(x =
value)) + geom_histogram() + facet_wrap(name ~
., scales = "free")
Importing Data in R Script
• Importing Data in R
• First, let’s consider a data-set which we can use
for the demonstration. For this demonstration,
we will use two examples of a single dataset, one
in .csv form and another .txt
• Reading a Comma-Separated Value(CSV) File
• Method 1: Using read.csv() Function Read CSV
Files into R
• The function has two parameters:
• file.choose(): It opens a menu to choose a csv file from
the desktop.
• header: It is to indicate whether the first row of the
dataset is a variable name or not. Apply T/True if the
variable name is present else put F/False.
• # import and store the dataset in data1
• data1 <- read.csv(file.choose(), header=T)
•
• # display the data
• data1
• Using read.table() Function
• This function specifies how the dataset is
separated, in this case we take sep=”, “ as an
argument.
• Example:
• R
• # import and store the dataset in data2
• data2 <- read.table(file.choose(), header=T,
sep=", ")
•
• # display data
• data2
• Understanding datasets
• A dataset is usually a rectangular array of data with
rows representing observations and columns
representing variables.IT provides an example of a
hypothetical patient dataset.
• A patient dataset
• PatientID AdmDate Age Diabetes Status
• 1 10/15/2009 25 type1 poor
• 2. 15/12/2007 32 type2 improved

Contenu connexe

Tendances

Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programmingNimrita Koul
 
Python Libraries and Modules
Python Libraries and ModulesPython Libraries and Modules
Python Libraries and ModulesRaginiJain21
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning pyingkodi maran
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python PandasNeeru Mittal
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming FundamentalsRagia Ibrahim
 
Data tidying with tidyr meetup
Data tidying with tidyr  meetupData tidying with tidyr  meetup
Data tidying with tidyr meetupMatthew Samelson
 

Tendances (20)

Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
Text categorization
Text categorizationText categorization
Text categorization
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
Python Libraries and Modules
Python Libraries and ModulesPython Libraries and Modules
Python Libraries and Modules
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Resume Screening
Resume ScreeningResume Screening
Resume Screening
 
Seaborn.pptx
Seaborn.pptxSeaborn.pptx
Seaborn.pptx
 
Language R
Language RLanguage R
Language R
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Data Analysis with Python Pandas
Data Analysis with Python PandasData Analysis with Python Pandas
Data Analysis with Python Pandas
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
Data tidying with tidyr meetup
Data tidying with tidyr  meetupData tidying with tidyr  meetup
Data tidying with tidyr meetup
 
Dtd
DtdDtd
Dtd
 

Similaire à Data Exploration in R.pptx

Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxSreeLaya9
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfssuser598883
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxSandeep Singh
 
Python for Data Analysis.pdf
Python for Data Analysis.pdfPython for Data Analysis.pdf
Python for Data Analysis.pdfJulioRecaldeLara1
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxtangadhurai
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxdataKarthik
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSHaritikaChhatwal1
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxParveenShaik21
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonAfzal Ahmad
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2PoguttuezhiniVP
 

Similaire à Data Exploration in R.pptx (20)

Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptx
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Python-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdfPython-for-Data-Analysis.pdf
Python-for-Data-Analysis.pdf
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
Python for Data Analysis.pdf
Python for Data Analysis.pdfPython for Data Analysis.pdf
Python for Data Analysis.pdf
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 
R training3
R training3R training3
R training3
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In python
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 

Plus de Ramakrishna Reddy Bijjam

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxRamakrishna Reddy Bijjam
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxRamakrishna Reddy Bijjam
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxRamakrishna Reddy Bijjam
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxRamakrishna Reddy Bijjam
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxRamakrishna Reddy Bijjam
 

Plus de Ramakrishna Reddy Bijjam (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Arrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptxArrays to arrays and pointers with arrays.pptx
Arrays to arrays and pointers with arrays.pptx
 
Auxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptxAuxiliary, Cache and Virtual memory.pptx
Auxiliary, Cache and Virtual memory.pptx
 
Python With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptxPython With MongoDB in advanced Python.pptx
Python With MongoDB in advanced Python.pptx
 
Pointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptxPointers and single &multi dimentionalarrays.pptx
Pointers and single &multi dimentionalarrays.pptx
 
Certinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptxCertinity Factor and Dempster-shafer theory .pptx
Certinity Factor and Dempster-shafer theory .pptx
 
Auxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptxAuxiliary Memory in computer Architecture.pptx
Auxiliary Memory in computer Architecture.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
Python With MongoDB.pptx
Python With MongoDB.pptxPython With MongoDB.pptx
Python With MongoDB.pptx
 
Python with MySql.pptx
Python with MySql.pptxPython with MySql.pptx
Python with MySql.pptx
 
PYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdfPYTHON PROGRAMMING NOTES RKREDDY.pdf
PYTHON PROGRAMMING NOTES RKREDDY.pdf
 
BInary file Operations.pptx
BInary file Operations.pptxBInary file Operations.pptx
BInary file Operations.pptx
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
CSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptxCSV JSON and XML files in Python.pptx
CSV JSON and XML files in Python.pptx
 
HTML files in python.pptx
HTML files in python.pptxHTML files in python.pptx
HTML files in python.pptx
 
Regular Expressions in Python.pptx
Regular Expressions in Python.pptxRegular Expressions in Python.pptx
Regular Expressions in Python.pptx
 
datareprersentation 1.pptx
datareprersentation 1.pptxdatareprersentation 1.pptx
datareprersentation 1.pptx
 
Apriori.pptx
Apriori.pptxApriori.pptx
Apriori.pptx
 

Dernier

HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 

Dernier (20)

HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 

Data Exploration in R.pptx

  • 1. Working with directory • Before writing a program in R important to find directory to load all list of file in the system • This can be done by using getwd() without pass any arguments • If you want to change directory then setwd(path). • It help to you to reset the current working directory to another location • List.files() helps to you to give information about your files • Dir() is equalent to list.files()
  • 2. Data Exploration in R • Data Exploration is a statistical approach or technique for analyzing data sets in order to summarize their important and main characteristics generally by using some visual aids. The EDA approach can be used to gather knowledge about the following aspects of data: • Main characteristics or features of the data. • The variables and their relationships. • Finding out the important variables that can be used in our problem.
  • 3. • EDA is an iterative approach that includes: • Generating questions about our data • Searching for the answers by using visualization, transformation, and modeling of our data. • Using the lessons that we learn in order to refine our set of questions or to generate a new set of questions. • Exploratory Data Analysis in R • In R Language, we are going to perform EDA under two broad classifications: • Descriptive Statistics, which includes mean, median, mode, inter-quartile range, and so on. • Graphical Methods, which includes histogram, density estimation, box plots, and so on.
  • 4. • Summary() • It includes functions like min,Max,median,mean… • Str() • Displays the internal structure of dataset • View() • Displays the given dataset in separate spread sheet • Head() • Displays first 6 rows of data • Tail() • Displays last 6 rows of data • Ncol() • It returns the number of columns in the data set
  • 5. • Nrows() • It returns the number of rows in the data set • Edit() • It is used to dynamic editing or data manipulation of dataset • Fix() • It is used to saves the changes in the dataset itself • Data() • List out the available data sets • Image() • Save.image() writes the external representation of R objects to the specific file
  • 6. • dim(iris)// Dimentions • names(iris)// The attributes • str(iris) // Structure is revealed • attributes(iris)//The names, class etc • iris[1:5] // the first 5 • Head(iris)//first six . tail(iris)// Last Six entries • idx<-sample(1:nrow(iris),5) 5 random values from the dataset • Iris[1:10,”Sepal.Length”]//10 values • Iris(idx) • Summary(iris) • Quantile(iris$Sepal.Length)//% disrtibution • Quantile(iris$Sepal.Length,c(0.1,0.3,0.65)) • Var(iris$Sepal.Length • Plot(iris)
  • 7. Commands for Data Exploration 1) Loading Example Data 2) Example 1: Print First Six Rows of Data Frame Using head() Function 3) Example 2: Return Column Names of Data Frame Using names() Function 4) Example 3: Get Number of Rows & Columns of Data Frame Using dim() Function 5) Example 4: Explore Structure of Data Frame Columns Using str() Function 6) Example 5: Calculate Descriptive Statistics Using summary() Function 7) Example 6: Count NA Values by Column Using colSums() & is.na() Functions 8) Example 7: Draw Pairs Plot of Data Frame Columns Using ggpairs() Function of GGally Package 9) Example 8: Draw Boxplots of Multiple Columns Using ggplot2 Package 10) Example 9: Draw facet_wrap Histograms of Multiple Columns Using ggplot2 Package
  • 8. Loading Example Data • we’ll need to load some example data. In this tutorial, we’ll use the mtcars data set, which contains information about motor trend car road tests. • We can import the mtcars data set to the current R session using the data() function as shown below: • data(mtcars) # Import example data frame
  • 9. Count NA Values by Column Using colSums() & is.na() Functions • The following R programming syntax demonstrates how to count the number of NA values in each column of a data frame. • To do this, we can apply the colSums and is.na functions: • colSums(is.na(mtcars)) # Count missing values
  • 10. Draw Pairs Plot of Data Frame Columns Using ggpairs() Function of GGally Package • Until now, we have performed an analytical exploratory data analysis based on numbers and certain RStudio console outputs. • However, when it comes to data exploration, it is also important to have a visual look at your data. • The following R code demonstrates how to create a pairs plot using the . • For this, we need the functions of the ggplot2 and GGally packages. • By installing and loading GGally, the ggplot2 package is also imported. So it’s enough to install and load GGally: • install.packages("GGally") # Install GGally package library("GGally") # Load GGally package • Next, we can apply the ggpairs function of the GGally package to our data frame: • ggpairs(mtcars) # Draw pairs plot
  • 11. Draw Boxplots of Multiple Columns Using ggplot2 Package • Boxplots are another popular way to visualize the columns of data sets. • To draw such a graph, we first have to manipulate our data using the tidyr package. In order to use the functions of the tidyr package, we first need to install and load tidyr to RStudio: • install.packages("tidyr") # Install & load tidyr library("tidyr") • Next, we can apply the pivot_longer function to reshape some of the columns of our data from wide to long format: • mtcars_long <- pivot_longer(mtcars, # Reshape data frame c("mpg", "disp", "hp", "qsec")) • Finally, we can apply the ggplot and geom_boxplot functions to our data to visualize each of the selected columns in a side-by-side boxplot graphic: • gplot(mtcars_long, # Draw boxplots • aes(x = value, fill = name)) + geom_boxplot()
  • 12. Draw facet_wrap Histograms of Multiple Columns Using ggplot2 Package • Typically, we would also have a look at our numerical columns in a histogram plot. • In the following R syntax, I’m creating a histogram for each of our columns. Furthermore, I’m using the facet_wrap function to separate each column in its own plotting panel: • ggplot(mtcars_long, # Draw histograms aes(x = value)) + geom_histogram() + facet_wrap(name ~ ., scales = "free")
  • 13. Importing Data in R Script • Importing Data in R • First, let’s consider a data-set which we can use for the demonstration. For this demonstration, we will use two examples of a single dataset, one in .csv form and another .txt • Reading a Comma-Separated Value(CSV) File • Method 1: Using read.csv() Function Read CSV Files into R • The function has two parameters:
  • 14.
  • 15. • file.choose(): It opens a menu to choose a csv file from the desktop. • header: It is to indicate whether the first row of the dataset is a variable name or not. Apply T/True if the variable name is present else put F/False. • # import and store the dataset in data1 • data1 <- read.csv(file.choose(), header=T) • • # display the data • data1
  • 16. • Using read.table() Function • This function specifies how the dataset is separated, in this case we take sep=”, “ as an argument. • Example: • R • # import and store the dataset in data2 • data2 <- read.table(file.choose(), header=T, sep=", ") • • # display data • data2
  • 17. • Understanding datasets • A dataset is usually a rectangular array of data with rows representing observations and columns representing variables.IT provides an example of a hypothetical patient dataset. • A patient dataset • PatientID AdmDate Age Diabetes Status • 1 10/15/2009 25 type1 poor • 2. 15/12/2007 32 type2 improved