SlideShare une entreprise Scribd logo
1  sur  77
INTRODUCTION TO STATISTICS
(DATA VISUALIZATION)
Dr. P. Rambabu, M. Tech., Ph.D., F.I.E.
14-Feb-2022
Topics
1. Introduction to Statistics
2. Sampling Techniques
3. Types of Statistics
a) Descriptive Statistics
b) Inferential Statistics
4. Variables and Types of Data
a) Qualitative
b) Quantitative
i. Discrete
ii. Continuous
5. Organizing and Graphing Data
a) Qualitative Data
b) Quantitative Data
Introduction to Statistics
Statistics is a branch of science dealing with collecting, organizing, summarizing, analyzing and
making decisions from data.
Statistics is divided into two main areas, which are
1. Descriptive Statistics:
It deals with methods for collecting, organizing, and describing data by using tables, graphs, and
summary measures.
For example, 66% of eligible voters voted in the 2020 presidential election
2. Inferential Statistics:
It deals with methods that use sample results, to help in estimation or make decisions about the
population.
The set of all elements (observations) of interest in a study is called a population, and the selected
numbers of elements from the population is called a sample.
For example, a poll suggests that 75% of voters will select a Candidate A. People haven’t voted yet, so
we don’t know what will happen, but we could reasonably conclude that Candidate A will win the
election.
Data is factual information. We collect data from a Population, the collection of all
individuals or items a researcher is interested in. Sample is a subset of the population
we get data from
Primary Data vs Secondary
Data
Primary Data:
This data has been compiled by the Researcher using such techniques as Surveys,
Experiments, depth interviews, observations, focus groups.
Secondary Data:
This data has been compiled or published elsewhere e.g., census data
Advantages:
It can be gathered quickly and inexpensively. It enables researchers to build on past
research.
Disadvantages:
Data may be outdated. May not be accurate. s
Sampling Techniques
Its known that in some cases, it’s hard to study a large population in order to make
conclusions about certain phenomena, for example if we interested in studying the
obesity in India, imagine that the researcher has a limited period of time say three
months, its ‘ impossible’ to survey all citizens to make conclusions about such
phenomenon during the determined period of time, so that sampling methodology
is the best solution in order to perform the study and get representative results
during shorter period and also it saves efforts and money.
Types of Sampling Techniques:
1. Simple Random Sampling – It is applicable when the population is slightly small.
2. Systematic Method
3. Stratified Method
4. Clustered Sampling Method
Systematic Sampling Method
Stratified Sampling Method
In statistics, a subset of a population share some characteristics is called a
‘stratum’ the plural is strata. In such condition, the stratified sampling method is
used and these subsets are selected randomly.
Clustered Sampling Method
Usually, each cluster consists of heterogeneous elements based on
geographical bases. The advantage of this method is that it’s cheaper than other
methods. Cluster sampling is used when the population is large or when it
involves elements residing in a large geographic area.
If a researcher interested in surveying the number of students in Engineering
Colleges. Assume that there are 25 Colleges in Telangana. To do so, the
researcher can select 5 colleges and survey all students in these colleges from
each geographical regions using cluster sampling method. This type of
clustered sampling is called ‘single-stage-cluster sampling’.
Assume that the needed sample size is 3000 students. First stage the
researcher will chose 5 colleges, and then the determined sample size is 3000
students. These can be selected from the five colleges using the simple random
sampling method or using the systematic method. In this case, we have ‘two-
Variables and Types of Data
Basic terms that will be used frequently in statistical problems, such terms are,
• an element,
• a variable and their types,
• a measurement, and
• a dataset.
Note:
Any study is based on a problem or phenomenon such as heavy traffics,
accidents, rating scales and grades or others. The researcher should define the
variables of interest before collecting data.
An element (or member of a sample or population) is a specific subject or object
about which the information is collected.
A variable is a characteristic under study that takes different values for different
elements.
For example, if we collect information about income of households, then income
is a variable. These households are expected to have different incomes; also,
some of them may have the same income.
The value of a variable for an element is called an observation or
measurement.
Types of Variables
In statistics, we have two types of variables according to their elements;
1. Quantitative/ Numerical
Quantitative variable gives us numbers representing counts or measurements.
Ex.: price of a shirt
a) Discrete – values that can be counted
b) Continuous - all values between any two specific values, They often
include fractions and decimals
2. Qualitative/ Categorical
Qualitative variable (or categorical data) gives us names or labels that are not
numbers representing the observations.
a) Nominal
b) Ordinal
Discrete Data
Continuous Data
Levels of Variables based on Measurement
Scale
Classification of Variables:
1. Nominal - The nominal level of measurement classifies data into mutually
exclusive (disjoint) categories in which no order or ranking can be imposed on the
data.
2. Ordinal - The ordinal level of measurement classifies data into categories that can
be ordered, however precise differences between the ranks do not exist
Nominal Level of Measurement
Ordinal Level of Measurement
Organizing and Graphing Qualitative Data
Data recorded in the sequence in which they are collected and before they are
processed or ranked are called Raw Data.
Suppose we collect information on the scores of 20 students from a Class. The
data values, in the order they are collected, are recorded in below Table
(Quantitative Data)
According to previous example, suppose we ask the same students about their
grades as A, B, C and D, after they completed and record the values in table 2.
(Qualitative Data)
Methods of organizing Qualitative Data:
1. Frequency Table
2. Relative Frequency and Percentage Distributions
3. Graphical Presentation of Qualitative Data
Frequency Table
Assume that a sample of 50 students from a class was selected, and those
students were asked how they feel about the degree of their satisfaction of the
program. The responses of those students are recorded below where (v) means
very high satisfaction, (s) means somewhat satisfaction and (n) means no
satisfaction.
Relative Frequency and Percentage Distributions
The relative frequency is a tool that shows what proportion of the total frequency
belongs to the corresponding category; therefore the relative frequency of a
category can be calculated by dividing the frequency of that category by the sum
of all frequencies
Graphical Presentation of Qualitative Data
There are many types of graphs that are used to display qualitative data; in this part we will study and graph
two of such graphs which they are commonly used to display the qualitative data, these graphs are the Bar
chart and the Pie chart.
A graph made of bars whose heights represent the frequencies of respective categories is called a bar
graph.
A circle divided into portions that represents the relative frequencies or percentages of a population or a
sample of different categories is called a pie chart.
Organizing and Graphing Quantitative Data
Frequency Distributions
The following data table represent the scores of 50 students in statistics course.
Use the this data to
a. Construct the frequency distribution table by using 12 classes
b. Construct the relative frequency distribution table
c. Find the percentages
Solution:
Step1: The largest score is 90, the lowest one is 37, so The range = 90–37 = 53
Step2: Divide the range by the number of classes, which is twelve 12 53 = 4.4
Step3: Round up the number in step 2 to the next whole number that is 5. Thus, the class width
is 5
Graphing Grouped Data:
Grouped data can be displayed in a histogram, a polygon or ogive. In this part we will
learn how to construct such graphs.
1. A histogram of grouped data in a frequency distribution table with equal class
widths is a graph in which class boundaries are marked on the horizontal axis and
the frequencies, relative frequencies, or percentages are marked on a vertical axis.
The Histogram gives us insight into the shape of the data distribution, literally how the values
are distributed across the bins. The part of the distribution that “trails off” to one or both sides is
called a tail of the distribution.
We can also use a histogram to identify modes. For numeric data, especially continuous
variables, we think of modes as prominent peaks.
Finally, we can also “smooth out” these histograms and use a smooth curve to examine the
shape of the distribution. Below are the smooth curve versions of the distributions shown in the
four histograms used to demonstrate skew and symmetry.
3. Ascending Cumulative Frequency Curve (Ogive):
An ogive is a curve drawn for the ascending cumulative frequency of grouped data in
a distribution table by first joining plotting dots marked above the upper boundaries of
classes at heights equal to the ascending cumulative frequencies of respective
classes, then joining these points by smooth curve.
Numerical Descriptive Measures
1. Measures of Central Tendency
a) Mean
b) Median
c) Mode
2. Measures of Variation
a) Maximum and Minimum
b) Range
c) Mean Deviation
d) Standard Deviation
e) Coefficient of Variation
3. Measures of Position
a) Standard Scores
b) Quartile and Interquartile Range
Measures of Central Tendency
Mean:
The most commonly used measure of central tendency is called mean (or the
average).
Example:
Mean for Grouped Data:
Weighted
Mean:
Median:
The median is the value of the middle term in a data set that has been ranked in
increasing or decreasing order.
Median for Grouped Data:
Let be a frequency distribution table given as follow:
Mode:
The mode is the value that occurs most often in a data set.
1. Find the mode for the given data set:
76, 150, 95, 750, 124, 985, 87, 490
Solution:
Because each value in this data set occurs only once, this data set contains no mode.
2. Find the mode for the given data set:
77, 82, 74, 81, 79, 84, 74, 78
Solution:
Here the value 74 occurs twice, that is with the highest frequency, so 74 is the mode.
3. Find the mode for the given data set
7, 8, 9, 10, 11, 12, 13, 13, 14, 17, 17, 45
Solution:
Since each of the two values, 13 and 17 occurs twice and each of the remaining values
occurs only once. Therefore, this data set has two modes: 13 and 17.
Means of Variation
Range:
The range for a data set is depends on two values (the smallest and the largest
values) among all values in such data set.
Range = Largest value – Smallest value
1. Find the range for the data set:
40, 10, 20, 30, 35, 40, 50, 60
Solution: The largest value is 60, and the smallest value is 10.
Therefore Range = Largest value – Smallest value = 60 – 10 = 50
Variance and Standard Deviation:
A most used measure of variation is called standard deviation denoted by ( for the
population and S for the sample). The numerical value of this measure helps us how
the values of the dataset corresponding to such measure are relatively closely around
the mean.
Another measure of variation is called the variance denoted by (2 for the population
and S2 for the sample); such measure can be found by squared the standard deviation.
Coefficient of Variation:
A coefficient of variation (CV) is one of well known measures that used to compare the
variability of two different data sets that have different units of measurement. Moreover,
one disadvantage of the standard deviation that its being a measure of absolute
variability and not of relative variability
Measures of Position:
For a population or a sample data sets, the measures of position are a tool that used to
determine the position of a single value in relation to other values.
Types of Measures:
• Standard Scores
• Quartiles
• Percentiles
• Percentile Rank
1. Standard Scores:
Quartiles:
Any data set can be divided into four equal parts by using a summary measure called
quartiles. There are three quartiles that used to dived the data set which is denoted by
(Qi) for i=1,2,3.
Quartiles are three summary measures that divide a ranked data set into four equal
parts. The second quartile is the same as the median, the first quartile is the middle
term among the observations that are less than the median, and the third quartile is
the value of the middle term among the observations that are greater than the median.
Interquartile Range:
A measure of variation that depends on the first and the third quartiles is celled the
interquartile range (IQR) and it is defined as follows
Box Plot:
It summarizes the data with 5 statistics plus extreme observations:
Drawing a box plot:
1.Draw the vertical axis to include all possible values in the data.
2.Draw a horizontal line at the median, at Q1, and at Q3. Use these to form a box.
3.Draw the whiskers. The whiskers’ upper limit is Q3+1.5xIQR and the lower limit is Q1-1.5xIQR.
4.The actual whiskers are then drawn at the next closet point within the limit.
5.Any points outside the whisker limits are included as individual points. These are potential outliers.
(Potential) outliers can help us… - examine skew (outliers in the negative direction suggest left skew; outliers in
the positive direction suggest right skew). - identify issues with data collection or entry, especially if the value of
the outliers doesn’t make sense.
Percentiles and Percentile Rank:
Random Variables:
Def: A random variable is the “Result of a chance event” that you can measure or not .
(or)
A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment's
outcomes, whose possible values are numerical outcomes of a random event.
Random variables are used in econometric or regression analysis to determine statistical relationships among one
another. A typical example of a random variable is the “outcome of a two coin toss”.
Outcome of a two coin toss:
X is the random Variable
P(x) is the probability of distribution
H Head
T Tail
P(x) = H H ----2
P(x) = H T-----1
p(x) = T H-----1
p(x) = T T -----0
Where X is the number of heads we get from tossing two coins,
then X could be the 0, 1, or 2.
Example of Random variable
• A person’s blood type
• Number of persons visited a Park
• Number of times a user visits LinkedIn in a
day
Types of Random Variables:
1. Discrete Random Variable
2. Continuous Random Variable
1. Discrete Random Variable:
The Discrete random variables take on a countable number of distinct values. Consider an experiment where a
coins is tossed three times.
If X represents the number of times that the coin comes up heads, then X is a discrete random variable that can
only have the values 0, 1, 2 or 3. No other value is possible for X.
Discrete random variables are usually counts.
Example:
• the number of children in a family.
• the Friday night attendance at a cinema
• the number of patients in a doctor's surgery.
• the number of defective light bulbs in a box of ten.
2. Continuous Random Variables:
The Continuous random variables can represent “any value within a specified range( 4ft to 6 ft height)” and can
take on an infinite number of possible values.
An example of a continuous random variable would be an experiment that involves measuring the average height
of a random group of 25 people.
Drawing on the latter, if X represents the random variable for the average height of a random group of 25 people,
you will find that the resulting outcome is a continuous figure since height may be 5’ or 5’-4” or 5’-8” etc.
Clearly, there is an infinite number of possible values for height.
Sample Distributions
Sampling Distribution
A sampling distribution can be defined as a probability distribution using statistics by first choosing a particular
population and then making use of random samples which are drawn from the population.
A lot of researchers, academicians, market strategists, etc. go ahead of sampling distribution instead of choosing the
entire population. This makes the data set easy and also manageable.
Example 1:
To make it easier, suppose a marketer wants to do an analysis of the number of youth riding a bicycle between two
regions within the age limit 13-18.
For this purpose, he will not take into account the entire population present in the two regions between 13-18 years
of age, which is practically not possible, and even if done, it too time-consuming, and the data set is not manageable.
Instead, the marketer will take a sample set of 200 each from each region and get the distribution done.
The average count of the usage of the bicycle here is termed as the sample mean. Each sample chosen has its own
mean generated, and the distribution done for the average mean obtained is defined as the sample distribution. The
deviation obtained is termed as the standard error.
Example 2:
Assuming that a researcher is conducting a study on the weights of the inhabitants of a particular town and he has
five observations or samples, i.e., 70kg, 75kg, 85kg, 80kg, and 65kg. The town is generally considered to be having a
normal distribution and maintains a standard deviation of 5kg in the aspect of weight measures. Thus the mean can
be calculated as (70+75+85+80+65)/5 = 75 kg.
Also, we assume that the population size is huge; thus, to go to the second step, we will divide the number of
observations or samples by 1, i.e., 1/5 = 0.20. Now we need to take the square root of 0.20, which comes to 0.45. The
square root is then multiplied by the standard deviation, i.e., 0.45*5 = 2.25kg. Thus standard error obtained is 2.25kg,
and the mean obtained was 75kg. These two factors can be used to describe the distribution.
Types of Sampling Distribution
1. Sampling Distribution of Mean
This can be defined as the probabilistic spread of all the means of samples chosen on a random basis of a fixed size
from a particular population. When samples have opted from a normal population, the spread of the mean obtained
will also be normal to the mean and the standard deviation. If the population is not normal to still, the distribution of
the means will tend to become closer to the normal distribution provided that the sample size is quite large.
2. Sampling Distribution of Proportion
This is primarily associated with the statistics involved in attributes. Here the role of binomial distribution comes into
play. Generally, it responds to the laws of the binomial distribution, but as the sample size increases, it usually
becomes normal distribution again.
Normal Distribution
3. Student’s T-Distribution
This type of distribution is used when the standard deviation of the population is unknown to the researcher or
when the size of the sample is very small. This type of distribution is very symmetrical and fulfils the condition of
standard normal variate. As the sample size increases, even T distribution tends to become very close to normal
distribution.
4. F Distribution
When the greater variance is mandatorily present in the numerator, the F distribution finds its usage as the degree
of freedom changes the critical values of F changes too, which is applicable for both large and small variances. This
can be calculated from the tables available.
The comparison is made from the measured value of F belonging to the sample set and the value, which is
calculated from the table if the earlier one is equal to or larger than the table value, the null hypothesis of the
study gets rejected.
5. Chi-Square Formula Distribution
This type of distribution is used when the data set involves dealing with values that include adding up the squares.
The set of squared quantities belonging to the variance of samples is added, and thus a distribution spread is made,
which we call as chi-square distribution.
Normal Distribution vs T-Distribution
Inferential Statistics
An Inferential Statistics is one of the two main branches of statistics. It drawing “Inferences” from Data.
It is making a “logical judgment”
Example: 80% of all vehicles in USA are 4 wheelers
An inferential statistics use a “random sample of data” from a population to describe and make inferences
about the population.
Inferential statistics use Statistical models (mathematical model) for compare your sample data to other
samples (or to previous research).
Example:
you might stand in front of a mall and ask a sample of 100 people. if they like shopping and after you could
make a bar chart for result based on answer or no answers (that would be Descriptive Statistics).
Where as, you could use your research (and Inferential Statistics) to reason that around 75-80% of the
population (all shoppers in all malls) like shopping.
There are two main areas of Inferential Statistics
1. Estimating parameters:
This means taking a statistic from your sample data (for example the sample mean) and using it to say
something about a population parameter (i.e. the population mean).
(for example ¼ kg rice for 2 people, the how much is required for 1000 peoples)
2.Hypothesis tests:
A hypothesis is an assumption, or an idea or an imagine or a guess. But Hypothesis testing is the process
of determining whether the given hypothesis result is true or not.
Example of Hypothesis,
if a new cancer drug is effective.
or
if breakfast helps for children to perform better in schools.
Hypothesis
1. Statement of Hypothesis
If you want to test a relationship between two or more things, you need to write hypotheses before you start your
experiment or data collection.
2. Null Hypothesis
A null hypothesis (H0) refers a hypothesis that states that there is no relationship between two population parameters.
In inferential statistics, the null hypothesis is a default hypothesis that quantity to be measured as a zero.
i.e. (H0=0)
3. Level of Significance ()
The level of significance is the probability of Rejecting the null hypothesis.
For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists.
4. Level of Confidence (1- )
The level of confidence refers to the percentage of all possible samples that can be expected to include the true
population parameter.
5. Rejection region
The rejection region is the ‘Area’ where the null hypothesis is rejected.
6. Non rejection region
The non-rejection region is the ‘Area’ where the null hypothesis is not rejected.
7. One-tail and Two-tail Test
The one-tailed test is used to reject the null hypothesis sample
mean should be either larger or smaller than the population
mean. This test is also referred to as a directional test or
directional hypothesis. The test is run to prove a claim either true
or false.
If a test shows the mean sample being both larger and smaller
than the population, it is a two-tailed test.
Types of Inferential Statistics Tests:
1. Linear Regression Analysis
2. Analysis of Variance
3. Analysis of Co-variance
4. Statistical Significance (T-Test)
5. Correlation Analysis
1. Linear Regression is a linear model, that
assumes a linear relationship between the
Independent variable (x) and the dependent
variable (y).
Here, y can be calculated from a linear
combination of the Independent variable (x).
2. Analysis of Variance (ANOVA):
This is another statistical method which is extremely popular in data science. It is used to test and analyse the
differences between two or more means from the data set (with errors and with out errors).
3. Analysis of Co-variance (ANCOVA):
It is an extension of the previous (ANOVA) models and it includes the combination of ANOVA and regression.
4. Statistical Significance (T-Test):
It is used to determine the significant difference between the means of two groups, which may be related in
certain features.
5. Correlation Analysis:
It is used to evaluate the strength of relationship between two variables.
The Regression slop is either negative or positive. It is depending upon the variables.
A negative correlation means that the value of one variable decreases while the value of the other increases and
positive correlation means that the value both variables decrease or increase simultaneously.
Differences between Descriptive and Inferential Statistics:
Descriptive Statistics Inferential Statistics
It describes the characteristics or properties of the target
data.
Make inferences from the sample and generalize them
to the population
Organize, analyse and present the data in a meaningful
way
Compare, Tests and predicts the data for future
outcomes.
The analyzed results are in the form of graphs, charts,
plots etc
The analyzed results are the probability scores (40% or
80% or 95.3% etc)
Describes the data which is already known Tries to make conclusions about the population beyond
the data available
Tools:
Measures of central tendency andsmeasures of spread
Tools:
Hypothesis tests, Analysis of variance and etc.
Dr. Rambabu Palaka
Professor
School of Engineering
Malla Reddy University, Hyderabad
Mobile: +91-9652665840
Email: drrambabu@mallareddyuniversity.ac.in
Reference:
INTRODUCTION TO STATISTICS MADE EASY
by Prof. Dr. Hamid Al-Oklah Dr. Said Titi Mr. Tareq Alodat

Contenu connexe

Tendances

Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and StatisticsT.S. Lim
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statisticsAjendra Sharma
 
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Stats Statswork
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsSantosh Bhandari
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsHiba Armouche
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencevasu Chemistry
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsKapil Dev Ghante
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysissristi1992
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsaan786
 
Statistical analysis using spss
Statistical analysis using spssStatistical analysis using spss
Statistical analysis using spssjpcagphil
 
Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)YesAnalytics
 
Data analysis
Data analysisData analysis
Data analysisneha147
 

Tendances (20)

Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
Descriptive statistics and graphs
Descriptive statistics and graphsDescriptive statistics and graphs
Descriptive statistics and graphs
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statistics
 
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
 
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
 
Math 102- Statistics
Math 102- StatisticsMath 102- Statistics
Math 102- Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inference
 
Statistics
StatisticsStatistics
Statistics
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysis
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Statistical analysis using spss
Statistical analysis using spssStatistical analysis using spss
Statistical analysis using spss
 
Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)
 
Data analysis
Data analysisData analysis
Data analysis
 

Similaire à Unit 1 - Statistics (Part 1).pptx

Similaire à Unit 1 - Statistics (Part 1).pptx (20)

Edited economic statistics note
Edited economic statistics noteEdited economic statistics note
Edited economic statistics note
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdf
 
Stat-Lesson.pptx
Stat-Lesson.pptxStat-Lesson.pptx
Stat-Lesson.pptx
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdf
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statistics
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Presentation of BRM.pptx
Presentation of BRM.pptxPresentation of BRM.pptx
Presentation of BRM.pptx
 
Biostats in ortho
Biostats in orthoBiostats in ortho
Biostats in ortho
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
Chapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and TabulationChapter 11 Data Analysis Classification and Tabulation
Chapter 11 Data Analysis Classification and Tabulation
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
Introduction To Statistics.ppt
Introduction To Statistics.pptIntroduction To Statistics.ppt
Introduction To Statistics.ppt
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
RM7.ppt
RM7.pptRM7.ppt
RM7.ppt
 
Unit 4 editing and coding (2)
Unit 4 editing and coding (2)Unit 4 editing and coding (2)
Unit 4 editing and coding (2)
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
Ch1
Ch1Ch1
Ch1
 
CH1.pdf
CH1.pdfCH1.pdf
CH1.pdf
 

Plus de Malla Reddy University

Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxMalla Reddy University
 
Unit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptxUnit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptxMalla Reddy University
 
GIS - Project Planning and Implementation
GIS - Project Planning and ImplementationGIS - Project Planning and Implementation
GIS - Project Planning and ImplementationMalla Reddy University
 
Fluid Mechanics - Hydrostatic Pressure
Fluid Mechanics - Hydrostatic PressureFluid Mechanics - Hydrostatic Pressure
Fluid Mechanics - Hydrostatic PressureMalla Reddy University
 
Fluid Mechanics - Fluid Pressure and its measurement
Fluid Mechanics - Fluid Pressure and its measurementFluid Mechanics - Fluid Pressure and its measurement
Fluid Mechanics - Fluid Pressure and its measurementMalla Reddy University
 

Plus de Malla Reddy University (20)

Angular Directives
Angular DirectivesAngular Directives
Angular Directives
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
 
Unit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptx
 
Unit 2 - Data Binding.pptx
Unit 2 - Data Binding.pptxUnit 2 - Data Binding.pptx
Unit 2 - Data Binding.pptx
 
Unit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptxUnit 1 - TypeScript & Introduction to Angular CLI.pptx
Unit 1 - TypeScript & Introduction to Angular CLI.pptx
 
GIS - Project Planning and Implementation
GIS - Project Planning and ImplementationGIS - Project Planning and Implementation
GIS - Project Planning and Implementation
 
Geo-spatial Analysis and Modelling
Geo-spatial Analysis and ModellingGeo-spatial Analysis and Modelling
Geo-spatial Analysis and Modelling
 
GIS - Topology
GIS - Topology GIS - Topology
GIS - Topology
 
Geographical Information System (GIS)
Geographical Information System (GIS)Geographical Information System (GIS)
Geographical Information System (GIS)
 
Introduction to Maps
Introduction to MapsIntroduction to Maps
Introduction to Maps
 
Fluid Mechanics - Fluid Dynamics
Fluid Mechanics - Fluid DynamicsFluid Mechanics - Fluid Dynamics
Fluid Mechanics - Fluid Dynamics
 
Fluid Kinematics
Fluid KinematicsFluid Kinematics
Fluid Kinematics
 
Fluid Mechanics - Hydrostatic Pressure
Fluid Mechanics - Hydrostatic PressureFluid Mechanics - Hydrostatic Pressure
Fluid Mechanics - Hydrostatic Pressure
 
Fluid Mechanics - Fluid Pressure and its measurement
Fluid Mechanics - Fluid Pressure and its measurementFluid Mechanics - Fluid Pressure and its measurement
Fluid Mechanics - Fluid Pressure and its measurement
 
Fluid Mechanics - Fluid Properties
Fluid Mechanics - Fluid PropertiesFluid Mechanics - Fluid Properties
Fluid Mechanics - Fluid Properties
 
Reciprocating Pump
Reciprocating PumpReciprocating Pump
Reciprocating Pump
 
E-waste Management
E-waste ManagementE-waste Management
E-waste Management
 
Biomedical Waste Management
Biomedical Waste ManagementBiomedical Waste Management
Biomedical Waste Management
 
Hazardous Waste Management
Hazardous Waste ManagementHazardous Waste Management
Hazardous Waste Management
 
Digital Elevation Model (DEM)
Digital Elevation Model (DEM)Digital Elevation Model (DEM)
Digital Elevation Model (DEM)
 

Dernier

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Dernier (20)

OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

Unit 1 - Statistics (Part 1).pptx

  • 1. INTRODUCTION TO STATISTICS (DATA VISUALIZATION) Dr. P. Rambabu, M. Tech., Ph.D., F.I.E. 14-Feb-2022
  • 2. Topics 1. Introduction to Statistics 2. Sampling Techniques 3. Types of Statistics a) Descriptive Statistics b) Inferential Statistics 4. Variables and Types of Data a) Qualitative b) Quantitative i. Discrete ii. Continuous 5. Organizing and Graphing Data a) Qualitative Data b) Quantitative Data
  • 3. Introduction to Statistics Statistics is a branch of science dealing with collecting, organizing, summarizing, analyzing and making decisions from data. Statistics is divided into two main areas, which are 1. Descriptive Statistics: It deals with methods for collecting, organizing, and describing data by using tables, graphs, and summary measures. For example, 66% of eligible voters voted in the 2020 presidential election 2. Inferential Statistics: It deals with methods that use sample results, to help in estimation or make decisions about the population. The set of all elements (observations) of interest in a study is called a population, and the selected numbers of elements from the population is called a sample. For example, a poll suggests that 75% of voters will select a Candidate A. People haven’t voted yet, so we don’t know what will happen, but we could reasonably conclude that Candidate A will win the election.
  • 4. Data is factual information. We collect data from a Population, the collection of all individuals or items a researcher is interested in. Sample is a subset of the population we get data from
  • 5. Primary Data vs Secondary Data Primary Data: This data has been compiled by the Researcher using such techniques as Surveys, Experiments, depth interviews, observations, focus groups. Secondary Data: This data has been compiled or published elsewhere e.g., census data Advantages: It can be gathered quickly and inexpensively. It enables researchers to build on past research. Disadvantages: Data may be outdated. May not be accurate. s
  • 6. Sampling Techniques Its known that in some cases, it’s hard to study a large population in order to make conclusions about certain phenomena, for example if we interested in studying the obesity in India, imagine that the researcher has a limited period of time say three months, its ‘ impossible’ to survey all citizens to make conclusions about such phenomenon during the determined period of time, so that sampling methodology is the best solution in order to perform the study and get representative results during shorter period and also it saves efforts and money. Types of Sampling Techniques: 1. Simple Random Sampling – It is applicable when the population is slightly small. 2. Systematic Method 3. Stratified Method 4. Clustered Sampling Method
  • 8. Stratified Sampling Method In statistics, a subset of a population share some characteristics is called a ‘stratum’ the plural is strata. In such condition, the stratified sampling method is used and these subsets are selected randomly.
  • 9. Clustered Sampling Method Usually, each cluster consists of heterogeneous elements based on geographical bases. The advantage of this method is that it’s cheaper than other methods. Cluster sampling is used when the population is large or when it involves elements residing in a large geographic area. If a researcher interested in surveying the number of students in Engineering Colleges. Assume that there are 25 Colleges in Telangana. To do so, the researcher can select 5 colleges and survey all students in these colleges from each geographical regions using cluster sampling method. This type of clustered sampling is called ‘single-stage-cluster sampling’. Assume that the needed sample size is 3000 students. First stage the researcher will chose 5 colleges, and then the determined sample size is 3000 students. These can be selected from the five colleges using the simple random sampling method or using the systematic method. In this case, we have ‘two-
  • 10. Variables and Types of Data Basic terms that will be used frequently in statistical problems, such terms are, • an element, • a variable and their types, • a measurement, and • a dataset. Note: Any study is based on a problem or phenomenon such as heavy traffics, accidents, rating scales and grades or others. The researcher should define the variables of interest before collecting data.
  • 11. An element (or member of a sample or population) is a specific subject or object about which the information is collected. A variable is a characteristic under study that takes different values for different elements. For example, if we collect information about income of households, then income is a variable. These households are expected to have different incomes; also, some of them may have the same income. The value of a variable for an element is called an observation or measurement.
  • 12.
  • 13.
  • 14. Types of Variables In statistics, we have two types of variables according to their elements; 1. Quantitative/ Numerical Quantitative variable gives us numbers representing counts or measurements. Ex.: price of a shirt a) Discrete – values that can be counted b) Continuous - all values between any two specific values, They often include fractions and decimals 2. Qualitative/ Categorical Qualitative variable (or categorical data) gives us names or labels that are not numbers representing the observations. a) Nominal b) Ordinal
  • 15.
  • 18. Levels of Variables based on Measurement Scale Classification of Variables: 1. Nominal - The nominal level of measurement classifies data into mutually exclusive (disjoint) categories in which no order or ranking can be imposed on the data. 2. Ordinal - The ordinal level of measurement classifies data into categories that can be ordered, however precise differences between the ranks do not exist
  • 19. Nominal Level of Measurement
  • 20. Ordinal Level of Measurement
  • 21. Organizing and Graphing Qualitative Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called Raw Data. Suppose we collect information on the scores of 20 students from a Class. The data values, in the order they are collected, are recorded in below Table (Quantitative Data)
  • 22. According to previous example, suppose we ask the same students about their grades as A, B, C and D, after they completed and record the values in table 2. (Qualitative Data) Methods of organizing Qualitative Data: 1. Frequency Table 2. Relative Frequency and Percentage Distributions 3. Graphical Presentation of Qualitative Data
  • 23. Frequency Table Assume that a sample of 50 students from a class was selected, and those students were asked how they feel about the degree of their satisfaction of the program. The responses of those students are recorded below where (v) means very high satisfaction, (s) means somewhat satisfaction and (n) means no satisfaction.
  • 24. Relative Frequency and Percentage Distributions The relative frequency is a tool that shows what proportion of the total frequency belongs to the corresponding category; therefore the relative frequency of a category can be calculated by dividing the frequency of that category by the sum of all frequencies
  • 25. Graphical Presentation of Qualitative Data There are many types of graphs that are used to display qualitative data; in this part we will study and graph two of such graphs which they are commonly used to display the qualitative data, these graphs are the Bar chart and the Pie chart. A graph made of bars whose heights represent the frequencies of respective categories is called a bar graph. A circle divided into portions that represents the relative frequencies or percentages of a population or a sample of different categories is called a pie chart.
  • 26. Organizing and Graphing Quantitative Data Frequency Distributions The following data table represent the scores of 50 students in statistics course. Use the this data to a. Construct the frequency distribution table by using 12 classes b. Construct the relative frequency distribution table c. Find the percentages
  • 27. Solution: Step1: The largest score is 90, the lowest one is 37, so The range = 90–37 = 53 Step2: Divide the range by the number of classes, which is twelve 12 53 = 4.4 Step3: Round up the number in step 2 to the next whole number that is 5. Thus, the class width is 5
  • 28. Graphing Grouped Data: Grouped data can be displayed in a histogram, a polygon or ogive. In this part we will learn how to construct such graphs. 1. A histogram of grouped data in a frequency distribution table with equal class widths is a graph in which class boundaries are marked on the horizontal axis and the frequencies, relative frequencies, or percentages are marked on a vertical axis.
  • 29.
  • 30.
  • 31. The Histogram gives us insight into the shape of the data distribution, literally how the values are distributed across the bins. The part of the distribution that “trails off” to one or both sides is called a tail of the distribution.
  • 32. We can also use a histogram to identify modes. For numeric data, especially continuous variables, we think of modes as prominent peaks.
  • 33. Finally, we can also “smooth out” these histograms and use a smooth curve to examine the shape of the distribution. Below are the smooth curve versions of the distributions shown in the four histograms used to demonstrate skew and symmetry.
  • 34. 3. Ascending Cumulative Frequency Curve (Ogive): An ogive is a curve drawn for the ascending cumulative frequency of grouped data in a distribution table by first joining plotting dots marked above the upper boundaries of classes at heights equal to the ascending cumulative frequencies of respective classes, then joining these points by smooth curve.
  • 35. Numerical Descriptive Measures 1. Measures of Central Tendency a) Mean b) Median c) Mode 2. Measures of Variation a) Maximum and Minimum b) Range c) Mean Deviation d) Standard Deviation e) Coefficient of Variation 3. Measures of Position a) Standard Scores b) Quartile and Interquartile Range
  • 36. Measures of Central Tendency Mean: The most commonly used measure of central tendency is called mean (or the average).
  • 40. Median: The median is the value of the middle term in a data set that has been ranked in increasing or decreasing order.
  • 41. Median for Grouped Data: Let be a frequency distribution table given as follow:
  • 42. Mode: The mode is the value that occurs most often in a data set. 1. Find the mode for the given data set: 76, 150, 95, 750, 124, 985, 87, 490 Solution: Because each value in this data set occurs only once, this data set contains no mode. 2. Find the mode for the given data set: 77, 82, 74, 81, 79, 84, 74, 78 Solution: Here the value 74 occurs twice, that is with the highest frequency, so 74 is the mode. 3. Find the mode for the given data set 7, 8, 9, 10, 11, 12, 13, 13, 14, 17, 17, 45 Solution: Since each of the two values, 13 and 17 occurs twice and each of the remaining values occurs only once. Therefore, this data set has two modes: 13 and 17.
  • 43. Means of Variation Range: The range for a data set is depends on two values (the smallest and the largest values) among all values in such data set. Range = Largest value – Smallest value 1. Find the range for the data set: 40, 10, 20, 30, 35, 40, 50, 60 Solution: The largest value is 60, and the smallest value is 10. Therefore Range = Largest value – Smallest value = 60 – 10 = 50
  • 44. Variance and Standard Deviation: A most used measure of variation is called standard deviation denoted by ( for the population and S for the sample). The numerical value of this measure helps us how the values of the dataset corresponding to such measure are relatively closely around the mean. Another measure of variation is called the variance denoted by (2 for the population and S2 for the sample); such measure can be found by squared the standard deviation.
  • 45.
  • 46. Coefficient of Variation: A coefficient of variation (CV) is one of well known measures that used to compare the variability of two different data sets that have different units of measurement. Moreover, one disadvantage of the standard deviation that its being a measure of absolute variability and not of relative variability
  • 47.
  • 48. Measures of Position: For a population or a sample data sets, the measures of position are a tool that used to determine the position of a single value in relation to other values. Types of Measures: • Standard Scores • Quartiles • Percentiles • Percentile Rank 1. Standard Scores:
  • 49.
  • 50. Quartiles: Any data set can be divided into four equal parts by using a summary measure called quartiles. There are three quartiles that used to dived the data set which is denoted by (Qi) for i=1,2,3. Quartiles are three summary measures that divide a ranked data set into four equal parts. The second quartile is the same as the median, the first quartile is the middle term among the observations that are less than the median, and the third quartile is the value of the middle term among the observations that are greater than the median. Interquartile Range: A measure of variation that depends on the first and the third quartiles is celled the interquartile range (IQR) and it is defined as follows
  • 51.
  • 52.
  • 53. Box Plot: It summarizes the data with 5 statistics plus extreme observations: Drawing a box plot: 1.Draw the vertical axis to include all possible values in the data. 2.Draw a horizontal line at the median, at Q1, and at Q3. Use these to form a box. 3.Draw the whiskers. The whiskers’ upper limit is Q3+1.5xIQR and the lower limit is Q1-1.5xIQR. 4.The actual whiskers are then drawn at the next closet point within the limit. 5.Any points outside the whisker limits are included as individual points. These are potential outliers. (Potential) outliers can help us… - examine skew (outliers in the negative direction suggest left skew; outliers in the positive direction suggest right skew). - identify issues with data collection or entry, especially if the value of the outliers doesn’t make sense.
  • 55.
  • 56.
  • 57.
  • 58. Random Variables: Def: A random variable is the “Result of a chance event” that you can measure or not . (or) A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment's outcomes, whose possible values are numerical outcomes of a random event. Random variables are used in econometric or regression analysis to determine statistical relationships among one another. A typical example of a random variable is the “outcome of a two coin toss”.
  • 59. Outcome of a two coin toss: X is the random Variable P(x) is the probability of distribution H Head T Tail P(x) = H H ----2 P(x) = H T-----1 p(x) = T H-----1 p(x) = T T -----0 Where X is the number of heads we get from tossing two coins, then X could be the 0, 1, or 2. Example of Random variable • A person’s blood type • Number of persons visited a Park • Number of times a user visits LinkedIn in a day
  • 60. Types of Random Variables: 1. Discrete Random Variable 2. Continuous Random Variable 1. Discrete Random Variable: The Discrete random variables take on a countable number of distinct values. Consider an experiment where a coins is tossed three times. If X represents the number of times that the coin comes up heads, then X is a discrete random variable that can only have the values 0, 1, 2 or 3. No other value is possible for X. Discrete random variables are usually counts. Example: • the number of children in a family. • the Friday night attendance at a cinema • the number of patients in a doctor's surgery. • the number of defective light bulbs in a box of ten.
  • 61. 2. Continuous Random Variables: The Continuous random variables can represent “any value within a specified range( 4ft to 6 ft height)” and can take on an infinite number of possible values. An example of a continuous random variable would be an experiment that involves measuring the average height of a random group of 25 people. Drawing on the latter, if X represents the random variable for the average height of a random group of 25 people, you will find that the resulting outcome is a continuous figure since height may be 5’ or 5’-4” or 5’-8” etc. Clearly, there is an infinite number of possible values for height.
  • 63. Sampling Distribution A sampling distribution can be defined as a probability distribution using statistics by first choosing a particular population and then making use of random samples which are drawn from the population. A lot of researchers, academicians, market strategists, etc. go ahead of sampling distribution instead of choosing the entire population. This makes the data set easy and also manageable. Example 1: To make it easier, suppose a marketer wants to do an analysis of the number of youth riding a bicycle between two regions within the age limit 13-18. For this purpose, he will not take into account the entire population present in the two regions between 13-18 years of age, which is practically not possible, and even if done, it too time-consuming, and the data set is not manageable. Instead, the marketer will take a sample set of 200 each from each region and get the distribution done. The average count of the usage of the bicycle here is termed as the sample mean. Each sample chosen has its own mean generated, and the distribution done for the average mean obtained is defined as the sample distribution. The deviation obtained is termed as the standard error.
  • 64. Example 2: Assuming that a researcher is conducting a study on the weights of the inhabitants of a particular town and he has five observations or samples, i.e., 70kg, 75kg, 85kg, 80kg, and 65kg. The town is generally considered to be having a normal distribution and maintains a standard deviation of 5kg in the aspect of weight measures. Thus the mean can be calculated as (70+75+85+80+65)/5 = 75 kg. Also, we assume that the population size is huge; thus, to go to the second step, we will divide the number of observations or samples by 1, i.e., 1/5 = 0.20. Now we need to take the square root of 0.20, which comes to 0.45. The square root is then multiplied by the standard deviation, i.e., 0.45*5 = 2.25kg. Thus standard error obtained is 2.25kg, and the mean obtained was 75kg. These two factors can be used to describe the distribution. Types of Sampling Distribution 1. Sampling Distribution of Mean This can be defined as the probabilistic spread of all the means of samples chosen on a random basis of a fixed size from a particular population. When samples have opted from a normal population, the spread of the mean obtained will also be normal to the mean and the standard deviation. If the population is not normal to still, the distribution of the means will tend to become closer to the normal distribution provided that the sample size is quite large. 2. Sampling Distribution of Proportion This is primarily associated with the statistics involved in attributes. Here the role of binomial distribution comes into play. Generally, it responds to the laws of the binomial distribution, but as the sample size increases, it usually becomes normal distribution again.
  • 66. 3. Student’s T-Distribution This type of distribution is used when the standard deviation of the population is unknown to the researcher or when the size of the sample is very small. This type of distribution is very symmetrical and fulfils the condition of standard normal variate. As the sample size increases, even T distribution tends to become very close to normal distribution. 4. F Distribution When the greater variance is mandatorily present in the numerator, the F distribution finds its usage as the degree of freedom changes the critical values of F changes too, which is applicable for both large and small variances. This can be calculated from the tables available. The comparison is made from the measured value of F belonging to the sample set and the value, which is calculated from the table if the earlier one is equal to or larger than the table value, the null hypothesis of the study gets rejected. 5. Chi-Square Formula Distribution This type of distribution is used when the data set involves dealing with values that include adding up the squares. The set of squared quantities belonging to the variance of samples is added, and thus a distribution spread is made, which we call as chi-square distribution.
  • 67. Normal Distribution vs T-Distribution
  • 68. Inferential Statistics An Inferential Statistics is one of the two main branches of statistics. It drawing “Inferences” from Data. It is making a “logical judgment” Example: 80% of all vehicles in USA are 4 wheelers An inferential statistics use a “random sample of data” from a population to describe and make inferences about the population. Inferential statistics use Statistical models (mathematical model) for compare your sample data to other samples (or to previous research). Example: you might stand in front of a mall and ask a sample of 100 people. if they like shopping and after you could make a bar chart for result based on answer or no answers (that would be Descriptive Statistics). Where as, you could use your research (and Inferential Statistics) to reason that around 75-80% of the population (all shoppers in all malls) like shopping.
  • 69. There are two main areas of Inferential Statistics 1. Estimating parameters: This means taking a statistic from your sample data (for example the sample mean) and using it to say something about a population parameter (i.e. the population mean). (for example ¼ kg rice for 2 people, the how much is required for 1000 peoples) 2.Hypothesis tests: A hypothesis is an assumption, or an idea or an imagine or a guess. But Hypothesis testing is the process of determining whether the given hypothesis result is true or not. Example of Hypothesis, if a new cancer drug is effective. or if breakfast helps for children to perform better in schools.
  • 70. Hypothesis 1. Statement of Hypothesis If you want to test a relationship between two or more things, you need to write hypotheses before you start your experiment or data collection. 2. Null Hypothesis A null hypothesis (H0) refers a hypothesis that states that there is no relationship between two population parameters. In inferential statistics, the null hypothesis is a default hypothesis that quantity to be measured as a zero. i.e. (H0=0) 3. Level of Significance () The level of significance is the probability of Rejecting the null hypothesis. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists. 4. Level of Confidence (1- ) The level of confidence refers to the percentage of all possible samples that can be expected to include the true population parameter.
  • 71. 5. Rejection region The rejection region is the ‘Area’ where the null hypothesis is rejected. 6. Non rejection region The non-rejection region is the ‘Area’ where the null hypothesis is not rejected.
  • 72. 7. One-tail and Two-tail Test The one-tailed test is used to reject the null hypothesis sample mean should be either larger or smaller than the population mean. This test is also referred to as a directional test or directional hypothesis. The test is run to prove a claim either true or false. If a test shows the mean sample being both larger and smaller than the population, it is a two-tailed test.
  • 73. Types of Inferential Statistics Tests: 1. Linear Regression Analysis 2. Analysis of Variance 3. Analysis of Co-variance 4. Statistical Significance (T-Test) 5. Correlation Analysis 1. Linear Regression is a linear model, that assumes a linear relationship between the Independent variable (x) and the dependent variable (y). Here, y can be calculated from a linear combination of the Independent variable (x).
  • 74. 2. Analysis of Variance (ANOVA): This is another statistical method which is extremely popular in data science. It is used to test and analyse the differences between two or more means from the data set (with errors and with out errors). 3. Analysis of Co-variance (ANCOVA): It is an extension of the previous (ANOVA) models and it includes the combination of ANOVA and regression. 4. Statistical Significance (T-Test): It is used to determine the significant difference between the means of two groups, which may be related in certain features. 5. Correlation Analysis: It is used to evaluate the strength of relationship between two variables. The Regression slop is either negative or positive. It is depending upon the variables. A negative correlation means that the value of one variable decreases while the value of the other increases and positive correlation means that the value both variables decrease or increase simultaneously.
  • 75.
  • 76. Differences between Descriptive and Inferential Statistics: Descriptive Statistics Inferential Statistics It describes the characteristics or properties of the target data. Make inferences from the sample and generalize them to the population Organize, analyse and present the data in a meaningful way Compare, Tests and predicts the data for future outcomes. The analyzed results are in the form of graphs, charts, plots etc The analyzed results are the probability scores (40% or 80% or 95.3% etc) Describes the data which is already known Tries to make conclusions about the population beyond the data available Tools: Measures of central tendency andsmeasures of spread Tools: Hypothesis tests, Analysis of variance and etc.
  • 77. Dr. Rambabu Palaka Professor School of Engineering Malla Reddy University, Hyderabad Mobile: +91-9652665840 Email: drrambabu@mallareddyuniversity.ac.in Reference: INTRODUCTION TO STATISTICS MADE EASY by Prof. Dr. Hamid Al-Oklah Dr. Said Titi Mr. Tareq Alodat