26. Types of Statistics –
Descriptive and Inferential
Statistics
Descriptive Statistics - methods of organizing,
summarizing, and presenting data in an informative
way
EXAMPLE 1: There are a total of 46,837 miles of interstate highways in the
United States. The interstate system represents only 1% of the nation’s total
roads but carries more than 20% of the traffic. The longest is I-90, which
stretches from Boston to Seattle, a distance of 3,099 miles.
EXAMPLE 2: The average person spent $103.00 on traditional Valentine’s
Day merchandise in 2013. This is an increase of $0.50 from 2012.
27. Frequency Table
FREQUENCY TABLE A grouping of qualitative data into mutually
exclusive and collectively exhaustive classes showing the number of
observations in each class.
28. Bar Charts
BAR CHART A graph that shows qualitative classes on the
horizontal axis and the class frequencies on the vertical axis. The
class frequencies are proportional to the heights of the bars.
29. Pie Charts
PIE CHART A chart that shows the proportion or percent that
each class represents of the total number of frequencies.
30. Histogram
HISTOGRAM A graph in which the classes are marked on the horizontal axis
and the class frequencies on the vertical axis. The class frequencies are
represented by the heights of the bars and the bars are drawn adjacent to
each other.
31. Measures of Location
◼ The purpose of a measure of location is to pinpoint the
center of a distribution of data.
◼ There are many measures of location. We will consider
three:
1. The arithmetic mean
2. The median
3. The mode
32. Population Mean
For ungrouped data, the population mean is the sum of
all the population values divided by the total number of
population values:
33. Example – Population Mean
There are 42 exits on I-75 through the state of Kentucky.
Listed below are the distances between exits (in miles).
1. Why is this information a population?
2. What is the mean number of miles between exits?
34. Example – Population Mean
There are 42 exits on I-75 through the state of Kentucky.
Listed below are the distances between exits (in miles).
Why is this information a population?
This is a population because we are considering all of the
exits in Kentucky.
What is the mean number of miles between exits?
37. Properties of the Arithmetic Mean
1. Every set of interval-level and ratio-level data has a
mean.
2. All the values are included in computing the mean.
3. The mean is unique.
4. The sum of the deviations of each value from the mean is
zero.
38. Sample Mean
For ungrouped data, the sample mean is the sum of all
the sample values divided by the number of sample
values:
40. The Median
Properties of the median:
1. There is a unique median for each data set.
2. It is not affected by extremely large or small values
and is therefore a valuable measure of central
tendency when such values occur.
3. It can be computed for ratio-level, interval-level, and
ordinal-level data.
4. It can be computed for an open-ended frequency
distribution if the median does not lie in an open-
ended class.
MEDIAN The midpoint of the values after they have been
ordered from the minimum to the maximum values.
41. Examples - Median
The ages for a sample of
five college students are:
21, 25, 19, 20, 22
Arranging the data in
ascending order gives:
19, 20, 21, 22, 25.
Thus the median is 21.
The heights of four
basketball players, in
inches, are:
76, 73, 80, 75
Arranging the data in
ascending order gives:
73, 75, 76, 80.
Thus the median is 75.5.
42. The Mode
MODE The value of the observation that appears
most frequently.
43. Example - Mode
Using the data
measuring the
distance in miles
between exits on I-75
through Kentucky,
what is the modal
distance?
Organize the distances
into a frequency table
and select the distance
with the highest
frequency.
46. Example – Range
The number of cappuccinos sold at the Starbucks
location in the Orange County Airport between 4 and 7
p.m. for a sample of 5 days last year were 20, 40, 50,
60, and 80. Determine the range for the number of
cappuccinos sold.
Range = Maximum value – Minimum value
= 80 – 20
= 60
47. Variance and Standard Deviation
◼ The variance and standard deviations are nonnegative and are
zero only if all observations are the same.
◼ For populations whose values are near the mean, the variance and
standard deviation will be small.
◼ For populations whose values are dispersed from the mean, the
population variance and standard deviation will be large.
◼ The variance overcomes the weakness of the range by using all
the values in the population.
VARIANCE The arithmetic mean of the squared deviations
from the mean.
STANDARD DEVIATION The square root of the variance.
48. Computing the Variance
Steps in computing the variance:
Step 1: Find the mean.
Step 2: Find the difference between each observation and
the mean, and square that difference.
Step 3: Sum all the squared differences found in Step 2.
Step 4: Divide the sum of the squared differences by the
number of items in the population.
49. Example – Variance and Standard
Deviation
The number of traffic citations issued during the last twelve months in
Beaufort County, South Carolina, is reported below:
What is the population variance?
Step 1: Find the mean.
29
12
348
12
10
34
...
17
19
=
=
+
+
+
+
=
=
N
x
50. Example – Variance and Standard
Deviation Continued
What is the population variance?
Step 2: Find the difference between each
observation and the mean of 29,
and square that difference.
Step 3: Sum all the squared differences found in Step 2.
Step 4: Divide the sum of the squared differences
by the number of items in the population.
124
12
488
,
1
)
( 2
2
=
=
−
=
N
X
52. Example – Sample Variance
The hourly wages for
a sample of part-time
employees at Home
Depot are: $12, $20,
$16, $18, and $19.
The sample mean is
$17.
What is the sample
variance?
54. Types of Statistics –
Descriptive and Inferential
Statistics
Inferential Statistics - A decision, estimate,
prediction, or generalization about a
population based on a sample.
Note: In statistics the word population and sample have a broader
meaning. A population or sample may consist of individuals
or objects.
55. Population versus Sample
A population is a collection of all possible individuals, objects, or
measurements of interest.
A sample is a portion, or part, of the population of interest.
56. Types of Variables
A. Qualitative or attribute variable - the
characteristic being studied is nonnumeric
EXAMPLES: Gender, religious affiliation, type of automobile
owned, state of birth, eye color
B. Quantitative variable - information is reported
numerically
EXAMPLES: balance in your checking account, minutes
remaining in class, or number of children in a family
57. Quantitative Variables -
Classifications
Quantitative variables can be classified as either discrete
or continuous.
A. Discrete variables can only assume certain
values and there are usually “gaps” between values.
EXAMPLE: the number of bedrooms in a house or the number of
hammers sold at the local Home Depot (1,2,3,…,etc.)
B. Continuous variables can assume any value within
a specified range.
EXAMPLE: the pressure in a tire, the weight of a pork chop, or the
height of students in a class
59. Four Levels of Measurement
Nominal level - data that is classified
into categories and cannot be
arranged in any particular order
EXAMPLES: eye color, gender, religious
affiliation
Ordinal level – data arranged in
some order, but the differences
between data values cannot be
determined or are meaningless
EXAMPLE: During a taste test of 4 soft drinks,
Mellow Yellow was ranked number 1, Sprite
number 2, Seven-up number 3, and Orange
Crush number 4.
Interval level - similar to the ordinal
level, with the additional property that
meaningful amounts of differences
between data values can be
determined. There is no natural zero
point.
EXAMPLE: temperature on the Fahrenheit scale
Ratio level - the interval level with an
inherent zero starting point.
Differences and ratios are meaningful
for this level of measurement.
EXAMPLES: monthly income of surgeons, or
distance traveled by manufacturer’s
representatives per month
68. Hypothesis
Examples:
▪ Pay is related to performance: People who are paid
more perform better.
▪ Consumers prefer Coke over all other cola drinks.
▪ Billboard advertising is more effective than advertising
in paper-based media.
▪ Consumer confidence in the economy is increasing.
69. Hypothesis Testing
HYPOTHESIS TESTING A procedure based on sample
evidence and probability theory to determine whether the
hypothesis is a reasonable statement.
70. Step 1: State the Null and the
Alternate Hypothesis
ALTERNATE HYPOTHESIS A statement that is
accepted if the sample data provide sufficient evidence
that the null hypothesis is false. It is represented by H1.
NULL HYPOTHESIS A statement about the value of a
population parameter developed for the purpose of
testing numerical evidence. It is represented by H0.
71. Step 2: State a Level of Significance:
Errors in Hypothesis Testing
The significance level of a test:
Defined as the probability of rejecting the null
hypothesis when it is actually true.
This is denoted by the Greek letter “”.
Also known as Type I Error.
We select this probability prior to collecting data and
testing the hypothesis.
A typical value of “” is 0.05.
72. Step 2: State a Level of Significance:
Errors in Hypothesis Testing
Another possible error:
The probability of not rejecting the null
hypothesis when it is actually false.
This is denoted by the Greek letter “β”.
Also known as Type II Error.
We cannot select this probability. It is related
to the choice of , the sample size, and the
data collected.
73. Step 2: State a Level of Significance:
Errors in Hypothesis Testing
74. Step 3: Identify the Test Statistic
TEST STATISTIC A value, determined from sample
information, used to determine whether to fail to reject or
reject the null hypothesis.
To test hypotheses about population means we use
the z or t-statistic. For hypotheses about population
variances, we use the F-statistic.
75. Step 4: Formulate a Decision Rule:
One-Tail vs. Two-Tail Tests
75
CRITICAL VALUE Based on the selected level of significance, the
critical value is the dividing point between the region where the null
hypothesis is rejected and the region where it is not rejected.
If the test statistic is greater than or less than the critical value (in the region
of rejection), then reject the null hypothesis.
76. Step 5: Take a Sample, Arrive at
a Decision
▪ Identify an unbiased sample.
▪ Collect the data on the relevant variables.
▪ Calculate test statistics.
▪ Compare the test statistic to the critical value.
▪ Make a decision, i.e., reject or fail to reject the null
hypothesis.
77. Step 6: Interpret the Result
▪ What does the decision to reject or fail to reject
the null hypothesis mean in the context of the
study?
▪ Examples:
▪ “Based on the data, there is no evidence to
support the hypothesis that pay is related to
performance.”
▪ “Based on the data, there is evidence that
billboard advertising if more effective than
paper-based media advertising”.
82. Conclusion From the analysis results, it was found that the statistical value t = 4.14
and P-value < 0.0001, which was less than α = 0.05.
therefore rejected H0, that is, the writing average score was not equal to 50 at the
significance level 0.05
84. Comparing Two Populations –
Examples
▪ Is there a difference in the mean value of residential real estate sold
by male agents and female agents in south Florida?
▪ Is there a difference in the mean number of defects produced on the
day and the afternoon shifts at Kimble Products?
▪ Is there a difference in the mean number of days absent between
young workers (under 21 years of age) and older workers (more
than 60 years of age) in the fast-food industry?
▪ Is there is a difference in the proportion of Ohio State University
graduates and University of Cincinnati graduates who pass the
state Certified Public Accountant Examination on their first attempt?
▪ Is there an increase in the production rate if music is piped into the
production area?
85. Comparing Two Population Means:
Equal, Known Population Variances
▪ No assumptions about the shape of the populations are required.
▪ The samples are from independent populations.
▪ The formula for computing the value of z is:
2
2
2
1
2
1
2
1
2
n
n
x
x
z
:
known
are
and
f
I 1
+
−
=
86. Comparing Two Population Means: Equal,
Known Population Variances – Example
The Fast Lane procedure was recently installed at the local food
market. The store manager would like to know if the mean
checkout time using the standard checkout method is longer than
using the Fast Lane procedure. She gathered the following sample
information. The time is measured from when the customer enters
the line until their bags are in the cart. Hence, the time includes
both waiting in line and checking out.
87. Comparing Two Population Means: Equal,
Known Population Variances – Example
Applying the six-step hypothesis testing procedure:
Step 1: State the null and alternate hypotheses.
(keyword: “longer than”)
H0: µS ≤ µU
H1: µS > µU
Step 2: Select the level of significance.
The .01 significance level is requested in the problem.
Step 3: Determine the appropriate test statistic.
Because both population standard deviations are known,
we can use the z-distribution as the test statistic.
88. Comparing Two Population Means: Equal,
Known Population Variances – Example
Step 4: Formulate a decision rule.
Reject H0 if z > z
z> 2.326
89. Comparing Two Population Means: Equal,
Known Population Variances – Example
Step 5: Take a sample and make a decision.
The computed value of 3.123 is larger than the
critical value of 2.326.
Our decision is to reject the null hypothesis.
123
.
3
064031
.
0
2
.
0
100
30
.
0
50
40
.
0
3
.
5
5
.
5
2
2
2
2
=
=
+
−
=
+
−
=
u
u
s
s
u
s
n
n
x
x
z
Step 6: Interpret the result. The difference of .20 minutes between the
mean checkout time using the standard method is too large to
have occurred by chance. We conclude the Fast Lane method is
faster.
95. Example 2. I want to know if the female sex ratio is equal to 0.50 or not.
BINOMIAL TEST
96.
97.
98. Conclusion From the results of the analysis, it was found that the P-
value = 0.2292, which was greater than α = 0.05, H0 was accepted, so
The proportion of females was equal to 0.50 or there were equal
proportions of females and males.
100. Testing the Hypothesis of Three or More
Equal Population Means
The F-distribution is also used for testing whether
two or more sample means came from the same
or equal populations.
Assumptions:
▪ The sampled populations follow the
normal distribution.
▪ The populations have equal
standard deviations.
▪ The samples are randomly selected
and are independent.
101. ◼ The null hypothesis is when the population means are all the
same.
◼ The alternative hypothesis is when at least one of the means is
different.
◼ The test statistic is the F distribution.
◼ The decision rule is whether to reject the null hypothesis if F
(computed) is greater than F (table) with numerator and denominator
degrees of freedom.
◼ Hypothesis Setup and Decision Rule:
Testing the Hypothesis of Three or More
Equal Population Means
H0: µ1 = µ2 =…= µk
H1: The means are not all equal.
Reject H0 if F > F,k-1,n-k
102. ONE – WAY ANOVA
H0: µ1 = µ2 =…= µk
H1: The means are not all equal.
Reject H0 if F > F,k-1,n-k
PSPP Syntax
oneway write by prog.
means tables = write by prog.
111. Correlation Analysis – Measuring
the Relationship Between Two Variables
◼ Analyzing relationships between two quantitative
variables.
◼ The basic hypothesis of correlation analysis: Does
the data indicate that there is a relationship between
two quantitative variables?
◼ For the Applewood Auto sales data, the data is
displayed in a scatter graph.
◼ Are profit per vehicle and age
correlated?
112. The Coefficient of Correlation (r) is a measure of the
strength of the relationship between two variables.
Correlation Analysis – Measuring
the Relationship Between Two Variables
▪ The sample correlation coefficient is identified by the lowercase letter r.
▪ It shows the direction and strength of the linear relationship between two
interval- or ratio-scale variables.
▪ It ranges from -1 up to and including +1.
▪ A value near 0 indicates there is little linear relationship between the variables.
▪ A value near +1 indicates a direct or positive linear relationship between the
variables.
▪ A value near -1 indicates an inverse or negative linear relationship between
the variables.
114. Correlation Analysis – Measuring
the Relationship Between Two Variables
◼ Computing the Correlation Coefficient:
115. Correlation Analysis – Example
The sales manager of Copier Sales of America has a large sales force
throughout the United States and Canada and wants to determine whether
there is a relationship between the number of sales calls made in a month
and the number of copiers sold that month. The manager selects a random
sample of 15 representatives and determines the number of sales calls each
representative made last month and the number of copiers sold.
Determine if the number of sales calls and copiers sold are correlated.
116. Correlation Analysis – Example
Step 1: State the null and alternate hypotheses.
H0: = 0 (the correlation in the population is 0)
H1: ≠ 0 (the correlation in the population is not 0)
Step 2: Select a level of significance.
We select a .05 level of significance.
Step 3: Identify the test statistic.
To test a hypothesis about a correlation we use the t-statistic.
For this analysis, there will be n-2 degrees of freedom.
117. Correlation Analysis – Example
Step 4: Formulate a decision rule.
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
t > t0.025,13 or t < -t0.025,13
t > 2.160 or t < -2.160
118. Correlation Coefficient – Example
Numerator
Step 5: Take a sample, calculate the statistics, arrive at a decision.
x = 96;y = 45;sx = 42.76;sy =12.89
119. Correlation Coefficient – Example
Step 5 (continued): Take a sample, calculate the statistics, arrive at a decision.
The t-test statistic, 6.216, is greater than 2.160. Therefore,
reject the null hypothesis that the correlation coefficient is zero.
Step 6: Interpret the result. The data indicate that there is a significant
correlation between the number of sales calls and copiers sold. We
can also observe that the correlation coefficient is .865, which
indicates a strong, positive relationship. In other words, more sales
calls are strongly related to more copier sales. Please note that this
statistical analysis does not provide any evidence of a causal
relationship. Another type of study is needed to test that hypothesis.
120. Y = a + b X
Regression Analysis
Correlation Analysis tests for the strength and direction of the
relationship between two quantitative variables.
Regression Analysis evaluates and “measures” the
relationship between two quantitative variables with a linear
equation. This equation has the same elements as any equation
of a line, that is, a slope and an intercept.
The relationship between X and Y is defined by the values of the
intercept, a, and the slope, b. In regression analysis, we use
data (observed values of X and Y) to estimate the values of a
and b.
121. Regression Analysis
EXAMPLES
▪ Assuming a linear relationship between the size of a home,
measured in square feet, and the cost to heat the home in
January, how does the cost vary relative to the size of the
home?
▪ In a study of automobile fuel efficiency, assuming a linear
relationship between miles per gallon and the weight of a car,
how does the fuel efficiency vary relative to the weight of a
car?
122. Regression Analysis: Variables
Y = a + b X
▪ Y is the Dependent Variable. It is the variable being predicted or
estimated.
▪ X is the Independent Variable. For a regression equation, it is the variable
used to estimate the dependent variable, Y. X is the predictor variable.
Examples of dependent and independent variables:
▪ How does the size of a home, measured in number of square feet, relate to the cost to heat the
home in January? We would use the home size as, X, the independent variable to predict the
heating cost, and Y as the dependent variable.
Regression equation: Heating cost = a + b (home size)
▪ How does the weight of a car relate to the car’s fuel efficiency? We would use car weight as, X,
the independent variable to predict the car’s fuel efficiency, and Y as the dependent variable.
Regression equation: Miles per gallon = a + b (car weight)
123. Regression Analysis – Example
◼ Regression analysis estimates a and b by fitting a line
to the observed data.
◼ Each line (Y = a + bX) is defined by values of a and b.
A way to find the line of “best fit” to the data is the:
LEAST SQUARES PRINCIPLE Determining a regression equation
by minimizing the sum of the squares of the vertical distances
between the actual Y values and the predicted values of Y.
124. Regression Analysis – Example
Recall the example involving Copier
Sales of America. The sales manager
gathered information on the number of
sales calls made and the number of
copiers sold for a random sample of
15 sales representatives. Use the
least squares method to determine a
linear equation to express the
relationship between the two
variables.
In this example, the number of sales
calls is the independent variable, X,
and the number of copiers sold is the
dependent variable, Y.
What is the expected number of
copiers sold by a representative who
made 20 calls?
Number of Copiers Sold = a + b ( Number of Sales Calls)
126. Regression Analysis - Example
Step 1: Find the slope (b) of the line.
Step 2: Find the y-intercept (a).
Step 4: What is the predicted number of sales if someone makes 20 sales calls?
Step 3: Create the regression equation.
Number of Copiers Sold = 19.9632 + 0.2608 ( Number of Sales Calls)
Number of Copiers Sold = 25.1792 = 19.9632 + 0.2608(20)
136. Multiple Regression Analysis
The general multiple regression equation with k independent variables is given by:
▪ X1 … Xk are the independent variables.
▪ a is the y-intercept
▪ b1 is the net change in Y for each unit change in X1 holding X2 … Xk
constant. It is called a partial regression coefficient or just a
regression coefficient.
▪ Determining b1, b2, etc. is very tedious. A software package such as
Excel or MINITAB is recommended.
▪ The least squares criterion is used to develop this equation.