Statr sessions 4 to 6

Measures of Variability:
Ungrouped Data
• Measures of Variability - tools that describe
the spread or the dispersion of a set of data.
– Provides more meaningful data when used
• with measures of central tendency
• in comparison to other groups

Measures of Spread or Dispersion:
Ungrouped Data
• Common Measures of Variability
–Range
–Inter-quartile Range
–Mean Absolute Deviation
–Variance and Standard Deviation
–Coefficient of Variation

Range
• The difference between the largest and the
smallest values in a set of data
– Advantage – easy to compute
– Disadvantage – is affected by extreme values

Interquartile Range
• Interquartile Range - range of values between
the first and third quartiles
• Range of the “middle half”; middle 50%
– Useful when researchers are interested in the
middle 50%, and not the extremes

Mean Absolute Deviation (MAD)
• One solution is to take the absolute value of each deviation
around the mean. This is called the Mean Absolute Deviation
• Note that while the MAD is intuitively simple, it is rarely used
in practice

Sample Variance
• Another solution is the take the Sum of Squared Deviations
(SSD) about the mean
• Sample Variance is the average of the squared deviations
from the arithmetic mean
• Sample Variance is denoted by s2

Why Sum of Squared Deviations about the mean?
- Squaring deviations remove sign
- The deviations are amplified

Calculation of Sample Variance

Degree of
Freedom

Sample Standard Deviation
• Sample standard deviation is the square root of the sample
variance
• Denoted by s
• Benefit: Same units as original data

Standard Deviation: Empirical Rule
If a variable is normally distributed, then:
1. Approximately 68% of the observations lie within 1 standard
deviation of the mean
2. Approximately 95% of the observations lie within 2 standard
deviations of the mean
3. Approximately 99.7% of the observations lie within 3 standard
deviations of the mean
Notes:
 Also applies to populations


Can be used to determine if a distribution is normally
distributed

Standard Deviation : Empirical Rule
99.7%
95%
68%

x 3s

x 2s

x s

x

x s

x 2s

x 3s

A Note about the Empirical Rule
Note: The empirical rule may be used to determine whether or
not a set of data is approximately normally distributed
1. Find the mean and standard deviation for the data
2. Compute the actual proportion of data within 1, 2, and 3
standard deviations from the mean
3. Compare these actual proportions with those given by the
empirical rule
4. If the proportions found are reasonably close to those of
the empirical rule, then the data is approximately normally
distributed

z Scores
• Z score – represents the number of Standard Deviation a
value (x) is above or below the mean of a set of numbers
when the data are normally distributed
• Z score allows translation of a value’s raw distance from the
mean into units of standard deviations
• z-scores typically range from -3.00 to +3.00
• z-scores may be used to make comparisons of raw scores

Coefficient of Variation (C.V.)
• Coefficient of Variation (CV) – measures the volatility
of a value (perhaps a stock portfolio), relative to its
mean. It’s the ratio of the standard deviation to the
mean, expressed as a percentage
• Useful when comparing Standard Deviation is
computed from data with different means
• Measurement of relative dispersion

Coefficient of Variation (C.V.)
Consider two different populations

Since 15.86 > 11.90, the first population is more
variable, relative to its mean, than the second
population

Calculation of Grouped Mean
Sometimes data are already grouped, and we are
interested in calculating summary statistics
Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

Frequency (f)
6
18
11
11
3
1
50

Midpoint (M)
25
35
45
55
65
75

f*M
150
630
495
605
195
75
2150

Median of Grouped Data - Example
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

Cumulative
Frequency Frequency
6
6
18
24
11
35
11
46
3
49
1
50
N = 50

Mode of Grouped Data
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

Frequency
6
18
11
11
3
1

• Mode : Midpoint of the modal class
• Modal class : the class with greatest frequency

Variance and Standard Deviation
of Grouped Data

Variance and Standard Deviation
of Grouped Data
Class Interval

20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

f

M

fM

6
18
11
11
3
1
50

25
35
45
55
65
75

150
630
495
605
195
75
2150

M
-18
-8
2
12
22
32

M
324
64
4
144
484
1024

2

2

f

M
1944
1152
44
1584
1452
1024
7200

Measures of Shape - Skewness
• Symmetrical – the right half is a mirror image
of the left half
• Skewed – shows that the distribution lacks
symmetry; used to denote the data is sparse at
one end, and piled at the other end
– Absence of symmetry
– Extreme values or “tail” in one side of a distribution
– Positively- or right-skewed vs. negatively- or left-skewed

0.00

0.00

0.05

0.05

y

y

0.10

0.10

0.15

0.15

Measures of Shape - Skewness

0

5

10
x

15

20

0

5

10

15

x

Positively- or right-skewed vs. negatively- or left-skewed

20

Box-and-Whisker Plot
A graphic representation of the 5-number summary:
• The five numerical values (smallest, first quartile, median, third
quartile, and largest) are located on a scale, either vertical or
horizontal
• The box is used to depict the middle half of the data that lies
between the two quartiles
• The whiskers are line segments used to depict the other half of
the data
• One line segment represents the quarter of the data that is
smaller in value than the first quartile
• The second line segment represents the quarter of the data
that is larger in value than the third quartile

Example: Box-and-Whisker Plot
Example: A random sample of students in a sixth grade class was
selected. Their weights are given in the table below. Find the 5number summary for this data and construct a boxplot:
63 64 76 76 81 83
90 91 92 93 93 93
99 101 108 109 112

63
L

85
Q1

92
~
x

85
94

99
Q3

86
97

88
99

112
H

89
99

Example: Box-and-Whisker Plot
Weights from Sixth Grade Class

60

70

80

90

100

110

Weight

L

Q1

~
x

Q3

H

Statr sessions 4 to 6

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Statr sessions 4 to 6

Similar to Statr sessions 4 to 6 (20)

More from Ruru Chowdhury

More from Ruru Chowdhury (20)

Recently uploaded

Recently uploaded (20)

Statr sessions 4 to 6