This document discusses measures of central tendency (mean, median, mode) and measures of spread (range, variance, standard deviation). It provides formulas and examples to calculate each measure. It also presents two problems, asking to calculate and compare various descriptive statistics for different data sets, such as milk yields from two cow herds and weaning weights of lambs from two breeds. A third problem asks to analyze and compare price data for rice from two markets.
3. A
Measures of Location (Central tendency)
Common measures of location are
1. Mean 2. Median 3. Mode
4. 1. Mean
Mean is of 3 types such as
a. Arithmetic Mean/Average
b. Harmonic Mean
c. Geometric Mean
5. Arithmetic Mean
The most widely utilized measure of central
tendency is the arithmetic mean or average.
The population mean is the sum of the values of
the variables under study divided by the total
number of observations in the population. It is
denoted by μ (‘mu’). Each value is algebraically
denoted by an X with a subscript denotation ‘i’.
For example, a small theoretical population
whose objects had values 1,6,4,5,6,3,8,7 would
be denoted X1 =1, X2 = 6, X3 = 4……. X8=7 …….
1.1
6. Mean….
We would denote the population size with a
capital N. In our theoretical population
N=8. The pop. mean μ would be
!
1 6 4 5 6 3 8 7
!
Formula 1.1: The algebraic shorthand formula
for a pop. mean is
μ =
5
8
=
+ + + + + + +
X
i 1
N
Σ=N
i
7. Mean…..
• The Greek letter Σ
(sigma) indicates
summation, the subscript i=1 means to start
with the first observation, and the superscript
N means to continue until and including the
Nth observation. For the example above,
would indicate the
5
Σ=
i 2
Xi
sum of X2+X3+X4+X5 or 6+4+5+6 = 21. To
reduce clutter, if the summation sign is not
indexed, for example Xi, it is implied that
the operation of addition begins with the first
observation and continues through the last
observation in a population, that is, =
Σ
N
Σ=
i
i X
1
Σ i X
8. Mean…
X
i Σ=1
The sample mean is defined by X
=
n
Where n is the sample size. The sample
mean is usually reported to one more
decimal place than the data and always
has appropriate units associated with it.
The symbol (X bar) indicates that the
observations of a subset of size n from a
population have been averaged.
N
i
X
9. Mean….
is fundamentally different from μ
X
because samples from a population can
have different values for their sample
mean, that is, they can vary from sample
to sample within the population. The
population mean, however, is constant
for a given population.
10. Mean…..
Again consider the small theoretical
population 1,6,4,5,6,3,8,7. A sample size
of 3 may consists of 5,3,4 with X
= 4 or
6,8,4 with X
= 6.
Actually there are 56 possible samples of
size 3 that could be drawn from the
population 1.1. Only four samples have a
sample mean the same as the population
mean ie X
= μ.
12. Mean…
Each sample mean X
is an unbiased
estimate of μ but depends on the values
included in the sample size for its actual
value. We would expect the average of all
possible X
‘s to be equal to the population
parameter, μ . This is in fact, the definition
of an unbiased estimator of the
pop. mean.
13. Mean…
If you calculate the sample mean for each
of the 56 possible samples with n=3 and
then average these sample means, they will
give an average value of 5 , that is, the
pop. mean, μ. Remember that most real
populations are too large or too difficult to
census completely, so we must rely on using
a single sample to estimate or approximate
the population characteristics.
16. 2. Median
The second measure of central tendency is
the MEDIAN. The median is the middle
most value of an ordered list of
observations. Though the idea is simple
enough, it will prove useful to define in
terms of an even simple notion. The depth
of a value is its position relative to the
nearest extreme (end) when the data are
listed in order from smallest to largest.
17. Median: Example 2.1
Table below gives the circumferences at
chest height (CCH) in cm and their
corresponding depths for 15 sugar maples
measured in a forest in Ohio.
CCH cm 18 21 22 29 29 36 37 38 56 59 66 70 88 93 120
Depth 1 2 3 4 5 6 7 8 7 6 5 4 3 2 1
No. of obs. = 15 (odd)
The population median M is the observation whose
depth is d = N +1
, where N is the population
size. 2
18. Median…
A sample median M is the statistic used to
approximate or estimate the population
median. M is defined as the observation
whose depth is d = n +1
where n is the
sample size. In example 2
2.1 the sample
size is n=15 so the depth of the sample
median is d=8. the sample n median X
+1
= X8 = 38 cm.
2
19. Median: Example 2.2
The table below gives CCH (cm) for 12
cypress pines measured near Brown lake
on North Stradebroke Island
CCH 17 19 31 39 48 56 68 73 73 75 80 122
Depth 1 2 3 4 5 6 6 5 4 3 2 1
No. of observation = 12 (even)
Since n=12, the depth of the median is 12 +1
= 6.5. Obviously no
observation has depth 6.5 , so this is the interpretation 2
as the average of
both observations whose depth is 6 in the list above. So M = 56 + 68
= 62
cm.
2
20. 3. Mode
The mode is defined as the most frequently
occurring value in a data set. The mode in
example 2.2 would be 73 cm while example 2.1
would have a mode of 29 cm.
!
More than 1 mode in a data set is possible.
2, 3, 4, 1, 1, 2, 3, 4, 5, 1, 4
Mode is 1 and 4 because both appeared 3 times
in the data set
!
22. Exercise
Hen egg sizes(ES, g) on 12 wks of lay were
randomly measured in a layer flock as
follows. Determine mean, median and mode
of egg size.
Hen
No.
01 02 03 04 05 06 07 08 09 10 11 12
ES 44 41 47 50 49 44 46 41 39 38 45 40
23. Measures of Spread (dispersion)
It measures variability of data. There are 4
measures in common.
1. Range
2. Variance
3. Standard Deviation (SD)
4. Standard Error (SE)
B
24. Range
Range: The simplest measure of dispersion or
spread of data is the RANGE
Formula: The difference between the largest
and smallest observations (two extremes) in
a group of data is called the RANGE.
Sample range= Xn – X1 ; Population range=XN-X1
The values Xn and X1 are called ‘sample range
limits’.
25. Range: Example
Marks of Biometry of 10 students are as follows
(Full marks 100)
Student ID Marks Obtained Marks ordered
01 35 80
02 40 75
03 30 70
Here, Range =
04 25 60
X1-X10=80-25
05 75 40
= 55
06 80 40
07 39 39
08 40 35
09 60 30
10 70 25
26. Range…
The range is a crude estimator of dispersion
because it uses only two of the data points and
is somewhat dependent on sample size. As
sample size increases, we expect largest and
smallest observations to become more
extreme. Therefore, sample size to increase
even though population range remains
unchanged. It is unlikely that sample will
include the largest and smallest values from
the population, so the sample range usually
underestimates the population range and
is ,therefore, a biased estimator.
27. Variance
Suppose we express each observation as a
distance from the mean xi = Xi - X
. These
differences are called deviates and will be
sometimes positive (Xi is above the mean) and
sometimes negative (Xi is below the mean). If
we try to average the deviates, they always
sum to zero. Because the mean is the central
tendency or location, the negative deviates
will exactly cancel out the positive deviates.
28. Variance…
Example X Mean Deviates
2 -2
3 -1
1 4 -3
8 4
6 2
Sum ! 0
(X X ) i − Σ = 0
29. Variance…
• Algebraically one can demonstrate the same result ! more generally, !!!!!!!!
!
!
Since is a constant for any sample,
! !!!
!!
!
Σ − = Σ −
Σ
= = =
n
i
n
i
i
n
i
( Xi X )
X X
1 1 1
X
X X X nX n
Σ i − =Σ − =
=
( ) , i 1
i
1
n
i
30. Variance…
X
X Σ i = =Σ i nX X
Since then , so
n
1 Σ − =Σ −Σ =
= =
( ) 0
=
1 1
n
i
n
i
n
i i i i X X X X
31. Variance…
• To circumvent the unfortunate property ,
the widely used measure of dispersion
called the sample variance utilizes the
square of the deviates. The quantity
is the sum of these squared deviates and
is referred to as the corrected sum of
squares (CSS). Each observation is
corrected or adjusted for its distance
from the mean.
2
i − Σ=
1
(X X )
n
i
32. Variance…
• Formula: The CSS is utilized in the
formula for the sample variance
!
−
s 2 Σ
( X −
X )
2
=
i !
n
The sample variance is usually reported to
two more decimal places than the data
and has units that are the square of the
measurement units.
33. Variance…
Or
Σ X 2 −
( Σ
X )2 /
n
s 2
!
= i i
n
−
1
With a similar deviation the population
variance computational formula can be
shown to be
2 ( )2 /
X X N Σ i − Σ i
=
N
2 σ
34. Variance…Example(unit Kg)
• Data set 3.1, 17.0, 9.9, 5.1, 18.0, 3.8,
10.0, 2.9, 21.2
!
ΣXi = 91 Σ 2 = 1318.92
n=9
i X
2
2
s 2 1318.92 −
(91) / 9 1318.92 920.11
398.81
= =
49.851
Kg 8
8
9 1
−
=
−
=
35. Variance…
Remember, the numerator must always
be a positive number because it is sum
of squared deviations.
Population variance formula is rarely
used since most populations are too
large to census directly.
36. Standard deviation (SD)
• Standard deviation is the positive square
root of the variance
!
!
!
And
X X N Σ i − Σ i
N
=
2 ( 2 ) /
σ
2 ( )2 /
X X n
= Σ Σ
s i i
1
−
−
n
38. Exercise 2
Daily milk yield (L) of 12 Jersey cows are
tabulated below. Calculate mean,
median, mode, variance and standard
error.
Cow no Milk yield Cow no Milk yield
1 23.7 7 21.5
2 12.8 8 25.2
3 28.9 9 21.4
4 21.4 10 25.2
5 14.5 11 19.5
6 28.3 12 19.6
39. Problem 1
• Two herds of cows located apart in
Malaysia gave the following amount of
milk/day (L). Compute arithmetic mean,
median, mode, range, variance, SD and
SE of daily milk yield in cows of the two
herds. Put your comments on what have
been reflected from two sets of milk
records as regards to their differences.
41. Problem 2
• Sex adjusted weaning weight of lambs in
two different breeds of sheep were
recorded as follows. Compute mean,
median, range, variance and SE in
weaning weight of lambs in two breed
groups. Put your comments on various
differences between the two groups.
43. Problem No 3
In a retail market study data on the price
(RM) of 10 kg rice were collected from 2
different markets in Malaysia. Using
descriptive statistics show the differences
relating to price of rice in the two
markets.
Pasar 1: 20, 25, 22, 23, 22, 24, 23, 21, 25,
25,23,22,25,24,24
Pasar 2: 25, 24, 26, 23, 26, 25, 25, 26, 24,
26, 24, 23,22, 25, 26, 26, 24