This document discusses various functions in R for exploring and manipulating dataframes, including dim(), nrow(), ncol(), str(), summary(), names(), head(), tail(), edit(), View(), merge(), join(), concatenate(), read.table(), and exploring data analysis techniques like box plots, histograms, scatter plots, run charts, bar charts, density plots and Pareto charts. It provides examples of using these functions to extract information from dataframes like dimensions, structure, column names, subsets of rows and columns, and combining multiple dataframes.
1. R Functions in Dataframe
Dim()
Nrow()
Ncol()
Str()
Summary()
Names()
Head()
Tail()
Edit()
2. • dim(): shows the dimensions of the data frame by
row and column
• str(): shows the structure of the data frame
• summary(): provides summary statistics on the
columns of the data frame
• colnames(): shows the name of each column in the
data frame
• head(): shows the first 6 rows of the data frame
• tail(): shows the last 6 rows of the data frame
• View(): shows a spreadsheet-like display of the
entire data frame
3. • dim(crime)
• str(crime)
• summary(crime)
• colnames(crime)
• ### The head() and tail() functions default to 6 rows, but
we can adjust the number of rows using the "n = "
argument
• head(crime, n = 10)
• tail(crime, n = 5)
• ### While the first 6 functions are printed to the console,
the View() function opens a table in another window
• View(crime)
5. Merge, join, concatenate and compare
• pandas provides various facilities for easily
combining together Series or DataFrame with
various kinds of set logic for the indexes and
relational algebra functionality in the case of
join / merge-type operations.
• In addition, pandas also provides utilities to
compare two Series or DataFrame and
summarize their differences.
• Concatenating objects
6. • The concat() function (in the main pandas
namespace) does all of the heavy lifting of
performing concatenation operations along an
axis while performing optional set logic (union
or intersection) of the indexes (if any) on the
other axes.
• Note that I say “if any” because there is only a
single possible axis of concatenation for Series.
8. Read from a table
• A data table can reside in a text file, the cells
inside the tables are separatedby blank
characters
• Rk<-read.table(“”)
9. Exploring Data
• Data in R is a set of organised information. Statistical
data type is more common in R, Which is set of
observations where values for the variables are passed.
• These input variables are ised in measuring , controlling
or manipulating the results of a program
• Integer
• Numeric
• Logical
• Character/String
• Factor
• Complex
10. Exploring Data Analysis
• EDA involves dataset analysis to summarise the main
characteristics in the form of visual representations.
• EDA using R is an approach used to summarise and visualise
the main characteristics of a data set
Which differs from initial data analysis.
1. Exploring data by inderstanding its structure and varialbles
2. Developing an intuition about the dataset
3. Considering how the dataset came to existance
4. Deciding how to investigate by providing a formal
statistical method
5. Extending better in sighting
6. Handling any missing value
7. Investigate with more formal statistical methods.
11. • Some of the graphical techniques used in EDA is
• Box Plot
• Histograms
• Scatter plot
• Run chart
• Bar chart
• Density plots
• Pareto chart