1. Measures of Variability:
Ungrouped Data
• Measures of Variability - tools that describe
the spread or the dispersion of a set of data.
– Provides more meaningful data when used
• with measures of central tendency
• in comparison to other groups
2. Measures of Spread or Dispersion:
Ungrouped Data
• Common Measures of Variability
–Range
–Inter-quartile Range
–Mean Absolute Deviation
–Variance and Standard Deviation
–Coefficient of Variation
3. Range
• The difference between the largest and the
smallest values in a set of data
– Advantage – easy to compute
– Disadvantage – is affected by extreme values
4. Interquartile Range
• Interquartile Range - range of values between
the first and third quartiles
• Range of the “middle half”; middle 50%
– Useful when researchers are interested in the
middle 50%, and not the extremes
6. Mean Absolute Deviation (MAD)
• One solution is to take the absolute value of each deviation
around the mean. This is called the Mean Absolute Deviation
• Note that while the MAD is intuitively simple, it is rarely used
in practice
7. Sample Variance
• Another solution is the take the Sum of Squared Deviations
(SSD) about the mean
• Sample Variance is the average of the squared deviations
from the arithmetic mean
• Sample Variance is denoted by s2
Why Sum of Squared Deviations about the mean?
- Squaring deviations remove sign
- The deviations are amplified
9. Sample Standard Deviation
• Sample standard deviation is the square root of the sample
variance
• Denoted by s
• Benefit: Same units as original data
10. Standard Deviation: Empirical Rule
If a variable is normally distributed, then:
1. Approximately 68% of the observations lie within 1 standard
deviation of the mean
2. Approximately 95% of the observations lie within 2 standard
deviations of the mean
3. Approximately 99.7% of the observations lie within 3 standard
deviations of the mean
Notes:
Also applies to populations
Can be used to determine if a distribution is normally
distributed
12. A Note about the Empirical Rule
Note: The empirical rule may be used to determine whether or
not a set of data is approximately normally distributed
1. Find the mean and standard deviation for the data
2. Compute the actual proportion of data within 1, 2, and 3
standard deviations from the mean
3. Compare these actual proportions with those given by the
empirical rule
4. If the proportions found are reasonably close to those of
the empirical rule, then the data is approximately normally
distributed
13. z Scores
• Z score – represents the number of Standard Deviation a
value (x) is above or below the mean of a set of numbers
when the data are normally distributed
• Z score allows translation of a value’s raw distance from the
mean into units of standard deviations
• z-scores typically range from -3.00 to +3.00
• z-scores may be used to make comparisons of raw scores
14. Coefficient of Variation (C.V.)
• Coefficient of Variation (CV) – measures the volatility
of a value (perhaps a stock portfolio), relative to its
mean. It’s the ratio of the standard deviation to the
mean, expressed as a percentage
• Useful when comparing Standard Deviation is
computed from data with different means
• Measurement of relative dispersion
15. Coefficient of Variation (C.V.)
Consider two different populations
Since 15.86 > 11.90, the first population is more
variable, relative to its mean, than the second
population
16. Calculation of Grouped Mean
Sometimes data are already grouped, and we are
interested in calculating summary statistics
Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Frequency (f)
6
18
11
11
3
1
50
Midpoint (M)
25
35
45
55
65
75
f*M
150
630
495
605
195
75
2150
17. Median of Grouped Data - Example
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Cumulative
Frequency Frequency
6
6
18
24
11
35
11
46
3
49
1
50
N = 50
18. Mode of Grouped Data
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Frequency
6
18
11
11
3
1
• Mode : Midpoint of the modal class
• Modal class : the class with greatest frequency
20. Variance and Standard Deviation
of Grouped Data
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
f
M
fM
6
18
11
11
3
1
50
25
35
45
55
65
75
150
630
495
605
195
75
2150
M
-18
-8
2
12
22
32
M
324
64
4
144
484
1024
2
2
f
M
1944
1152
44
1584
1452
1024
7200
21. Measures of Shape - Skewness
• Symmetrical – the right half is a mirror image
of the left half
• Skewed – shows that the distribution lacks
symmetry; used to denote the data is sparse at
one end, and piled at the other end
– Absence of symmetry
– Extreme values or “tail” in one side of a distribution
– Positively- or right-skewed vs. negatively- or left-skewed
24. Box-and-Whisker Plot
A graphic representation of the 5-number summary:
• The five numerical values (smallest, first quartile, median, third
quartile, and largest) are located on a scale, either vertical or
horizontal
• The box is used to depict the middle half of the data that lies
between the two quartiles
• The whiskers are line segments used to depict the other half of
the data
• One line segment represents the quarter of the data that is
smaller in value than the first quartile
• The second line segment represents the quarter of the data
that is larger in value than the third quartile
25. Example: Box-and-Whisker Plot
Example: A random sample of students in a sixth grade class was
selected. Their weights are given in the table below. Find the 5number summary for this data and construct a boxplot:
63 64 76 76 81 83
90 91 92 93 93 93
99 101 108 109 112
63
L
85
Q1
92
~
x
85
94
99
Q3
86
97
88
99
112
H
89
99