This document discusses various measures of variability and dispersion in statistics, including the range, quartiles, interquartile range, percentiles, and five number summary. It provides definitions and examples of each measure. The range is defined as the difference between the highest and lowest values in a data set. Quartiles split a data set into four equal parts, with the first (Q1) and third (Q3) quartiles used to calculate the interquartile range. Percentiles indicate the percentage of values below a given score. The five number summary encapsulates the minimum, first quartile, median, third quartile, and maximum.
Presentation on how to chat with PDF using ChatGPT code interpreter
Intro Statistics Built Environ Course
1. Introduction to Statistics for Built
Environment
Course Code: AED 1222
Compiled by
DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED)
CENTRE FOR FOUNDATION STUDIES (CFS)
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
2. Lecture 8
Today’s Lecture:
The range
Quartiles & the Interquartile range.
Percentiles
Percentiles Rank
The five number summary
Measures of variability/dispersion
Part I
3. What is/are Measures of Variation/Dispersion?
●If the dispersion are widely dispersed, the central
location is said to be less representative of the
data as a whole.
●If the dispersion are closely dispersed, the central
location is considered more reliable.
Measures of Variation/Dispersion
●Measures of Variation / Dispersion is an
information on the spread or variability of the data
values.
6. The range
●The range is simply the difference between the
largest and the smallest observed values in a data set.
Thus, range, including any outliers, is the actual
spread of data.
●A great deal of information is ignored when
computing the range, since only the largest and
smallest data values are considered.
Range = difference between highest and lowest
observed values
What is/are Range?
8. ●The range value of a data set is greatly influenced
by the presence of just one unusually large or small
value (outlier).
●The range can be expressed as an interval such as
4–10, where 4 is the lowest value and 10 is highest.
●Often, it is expressed as interval width. For
example, the range of 4–10 can also be expressed
as a range of 6.
The range cont.
10. ●Other disadvantages of using range is that it does not
measure the spread of the majority of values in a data set
—it only measures the spread between highest and
lowest values.
●As a result, other measures are required in order to give
a better picture of the data spread.
●The range is an informative tool used as a supplement
to other measures such as the standard deviation or
semi-interquartile range, but it should rarely be used as
the only measure of spread.
The range cont.
11. 1, 2, 4, 6, 12, 15, 19, 26
Smallest Value Largest Value
Does not take into account how clumped together
the scores are
Range = 26 - 1 = 25
Question: Is range a good measure of spread/dispersion?
Question: Is value 25 a good representative value?
The range cont.
Example 1:
12. 425425 430430 430430 435435 435435 435435 435435 435435 440440 440440
440440 440440 440440 445445 445445 445445 445445 445445 450450 450450
450450 450450 450450 450450 450450 460460 460460 460460 465465 465465
465465 470470 470470 472472 475475 475475 475475 480480 480480 480480
480480 485485 490490 490490 490490 500500 500500 500500 500500 510510
515515 525525 525525 525525 535535 549549 550550 570570 570570
575575 575575 580580 590590 600600 600600 600600 600600 615615 615615
510510
Smallest value
Largest value
does not take into
account how
clumped together
the scores are
Range = 615 – 425 = 190
Question: Is range a good measure of spread/dispersion?
The range cont.
Example 2:
13. The quartiles
●In descriptive statistics, a quartile is any of the three values
which divide the sorted (arrayed) data set into four equal
parts, so that each part represents one fourth of the sampled
population.
●The median divides the data into two equal sets.
●The lower quartile is the value of the middle of the first set,
where 25% of the values are smaller than Q1 and 75% are
larger. This first quartile takes the notation Q1.
●The upper quartile is the value of the middle of the second
set, where 75% of the values are smaller than Q3 and 25% are
larger. This third quartile takes the notation Q3.
What is/are Quartiles?
15. The formula for locating the position of the
observation at a given percentile, y, with n data
points sorted in ascending order is:
i = Ly = (y/100)n
•Case 1: If L is a whole number, then the value will
be found halfway between positions L and L+1.
•Case 2: If L is a decimal, round up to the nearest
whole number. (for example, L = 1.2 becomes 2).
Locating the position of the quartiles
The quartiles cont.
17. The Interquartile Range
●The interquartile range is another range used as a
measure of the spread.
●The difference between upper and lower quartiles (Q3–
Q1), which is called the interquartile range, also indicates
the dispersion of a data set.
●The interquartile range spans 50% of a data set, and
eliminates the influence of outliers because, in effect, the
highest and lowest quarters are removed.
What is/are Interquartile Range?
20. An exercise
A year ago, Ali began working at a computer store. His
supervisor asked him to keep a record of the number of
sales he made each month.
The following data set is a list of his sales for the last 12
months: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.
•Use Ali's sales records to find:
the median
the range
the upper and lower quartiles
the interquartile range
21. The values in an ascending array are:
1, 11, 15, 19, 20, 24, 28, 34, 37, 47, 50, 57.
Median = (12 + 1) ÷ 2
= 6.5th value
= (6th
+ 7th
observations) ÷ 2
= (24 + 28) ÷ 2
= 26
Range = difference between the highest and lowest
values = 57 – 1
= 56
Exercise cont.
22. Lower quartile = value of middle of first half of data Q1 =
the median of 1, 11, 15, 19, 20, 24
= (3rd
+ 4th
observations) ÷ 2
= (15 + 19) ÷ 2
= 17
Upper quartile = value of middle of second half of data Q3
= the median of 28, 34, 37, 47, 50, 57
= (3rd
+ 4th
observations) ÷ 2
= (37 + 47) ÷ 2
= 42
Interquartile range = Q3–Q1 = 42 – 17 = 25
Exercise cont.
24. Percentiles
●The ath
percentile is a value so that roughly a%
of the data are smaller and (100-a)% of the
data are larger.
●There are three steps for computing a
percentile.
– Organize the data into an ascending array;
– Count the number of values (n);
– Select the a*(n+1) observation.
What is/are Percentile?
26. You can't always be so lucky to have a*(n+1) be
a nice whole number. Here are some scenarios:
•If a*(n+1) is not a whole number, then go
halfway between the two adjacent numbers.
•If a*(n+1) < 1, select the smallest observation.
•If a*(n+1) > n, select the largest observation.
Percentiles cont.
27. • Arrayed data: 18, 33, 58, 67, 73, 93, 147
• There are 7 observations (n=7).
• Select 0.50*(7+1) = 4th observation.
• Therefore, the 50th percentile equals 67.
-Notice that there are three observations larger than
67 and three observations smaller than 67.
Percentiles cont.
Example:
Compute the 50th percentile for the following data set
73, 58, 67, 93, 33, 18, 147
50th
percentile
28. Suppose we want to compute the 20th percentile…
• Notice that p*(n+1) = 0.20*(7+1)=1.6. This is not a
whole number so we select halfway between 1st and
2nd observation or 25.5.
Suppose we want to compute the 10th percentile…
• Since 0.10*(7+1)=0.8, we should select the smallest
observation which is 18.
Percentile cont.
18, 33, 58, 67, 73, 93, 14710th
percentile
20th
percentile25.5Example: (Cont.)
29. • Percentile rank of a score is the percentage of
scores in its frequency distribution which are
lower than it.
• Percentile ranks are commonly used to clarify
the interpretation of scores on standardized
tests.
• Given formula:
Pr = (number of value below the score+ 0.5) ÷ ( total
number of values) (100%).
Percentiles Rank
What is/are Percentile Rank?
31. The five number summary
A five number summary uses percentiles to
describe a set of data. The five number summary
consists of
•MAX - the maximum value
•75% - the 75th
percentile
•50% - the 50th
percentile (or the median)
•25% - the 25th
percentile
•MIN - the minimum value
The five number summary splits the data into four
regions, each of which contains 25% of the data.
What is/are Five Number Summary?
32. • The minimum value is = 1
• The lower half is {1, 3, 4}, and the median of that half (the
25th
percentile) is = 3
• The median (the 50th
percentile) is = 5
• The upper half is {6, 7, 9}, and the median of that half (the
75th
percentile) is = 7
• The maximum value is = 9
• The minimum value is = 1
• The lower half is {1, 3, 4}, and the median of that half (the
25th
percentile) is = 3
• The median (the 50th
percentile) is = 5
• The upper half is {6, 7, 9}, and the median of that half (the
75th
percentile) is = 7
• The maximum value is = 9
The five number summary cont.
Example:
Find the five number summary for the data set.
1, 3, 4, 5, 6, 7, 9
33. Next class…
The following topics will be discussed:
Measures of variability / dispersion (Part II):
The average absolute deviation
The Variance
The Standard deviation
Coefficient of Variation (CV)