Empirics of standard deviation

Empirics of Standard Deviation
In research, there are the different methods of measuring data to be analyzed. The reason for these is to
measure the level of dispersion (Eboh, 2009). Dispersion is the tendency of values of a variable to scatter away from
the mean or midpoint. The data are measured majorly with basic statistical tools such as mean, median and mode. To
arrive at accurate measurement, the use of standard deviation is employed. Standard deviation is a measurement that
is designed to find the disparity between the calculated mean.it is one of the tools for measuring dispersion. To have a
good understanding of these, it is of general interest to give a better light to the following terms (mean, median, mode)
and variance) also their uses.
MEAN
Panneerslvam (2008) defined mean as the ratio between the sum of the observations and the number of the
observation.in his study, he termed it as arithmetic mean. .Eboh (2009) said it is sum of observations divided by the
number of observations. Mathematically, the mean is the arithmetic average of a number of scores. To obtain the
mean, add your scores and divide by the number of scores that you have. Simply put that the mean is the addition of
all the collated data that are to be analyzed, which is then divided by the number of the data to the analyzed.it is
generally stated as
/x = ∑ 𝑥𝑖𝑛
𝑖=1
/n
Where /x is the arithmetic mean; xi, the ith observation; and n, the total number of observations
Example 1.1 determine the arithmetic mean of salaries of the employees s shown in the table 1.1 below
Employees no. 1 2 3 4 5 6 7 8 9
Monthly salary N
,000
20 27 34 56 34 45 20 29 41
Solution-----------The number of observations, n =9
Using the above formula, /x = ∑ 𝑥𝑖𝑛
𝑖=1
/n
20000+27000+34000+56000+34000+45000+20000+29000+41000 = N34000
9
It should be noted that before summing them up, they must be in the same units and also in the same scale. This means
that there can’t different values that ought to be summated, such as having naira and dollars values that are to be
summated, it will be impossible to do so. The summation of these two different scales of measurement won’t be
possible. Consider the following data, which represents the time needed to complete a reading task, as an example.

Example 1.2
Times in
miuntes
6 3 5 5 2 7 6 4 3
Total = 43
The mean is the sum of scores divided by the number of scores, mathematically: Mean = ΣX/N = 43/10 = 4.3
PROPERTIES of THE MEAN
The mean has certain properties that are attributed to it (Eboh, 2009). They include
1. It has algebraic property that the sum of the deviations of each observation from the mean will always be zero.it
means that when the mean observation is subtracted from the mean and summed together (which will comprise of
both the positive and negative values), it must result to zero. This is expressed mathematically as thus:
∑ (𝑥𝑖 − 𝑥
)𝑁
𝑖=1 = 0
2. The sum of the squared deviations of each observation from the mean is less than the sum of the squared deviations
about any other number
∑ (𝑥𝑖 − 𝑥
)𝑁
𝑖=1 2= minimum
This means that the when the various values that were computed together to form the mean are being subtracted,
originally when summed up, they will give a zero value. But when squared together after the subtraction from the
mean, the result arrived it the minimum value.
3. When mean is commuted from a grouped data which is a special case, midpoints of each assumed that each of the
interval classes is being assumed. This is illustrated mathematically as this
𝑋−
= ∑ 𝒌
𝒊=𝟏 fi mi =
N
Where fi = number of cases in the ith
category, with f, =N
M1 = midpoint of the ith
category
K = number of categories
This is further expressed as finding the idle point of a grouped data that is expected to be analyzed.an example of a
grouped data is 1950-2950. Such a grouped data has its midpoint has 2,450.the 2,450 is what will be used for mean
analysis.
MEDIAN
According to R.panneerslvam (2008), the median is the score found at the exact middle of the set of values. It refers to
the midpoint in a series of numbers. To find the median the values are arranged in order from smallest to largest. If
there is an odd number of values, the middle value is the median. If there is an even number of values, the average of
the two middle values is the median.

Example 1.3: Find the median of 19, 29, 36, 15, and 20
In order: 15, 19, 20, 29, 36 since there are 5 values (odd number), 20 is the median (middle number)
Example 1.4: Find the median of 67, 28, 92, 37, 81, 75
In order: 28, 37, 67, 75, 81, 92 since there are 6 values (even number), we must average those two middle numbers
to get the median value. Average: 67 + 75 = 142 = 71 is the median value
2 2
MODE
The mode of a set of values is the value that occurs most often. A set of values may have more than one mode or no
mode.
Example 1.5: Find the mode of 15, 21, 26, 25, 21, 23, 28, 21
The mode is 21 since it occurs three times and the other values occur only once.
Example 1.6: Find the mode of 12, 15, 18, 26, 15, 9, 12, 27
The modes are 12 and 15 since both occur twice.
Example 1.7: Find the mode of 4, 8, 15, 21, 23
There is no mode since all the values occur the same number of times.
Since there are 3 different measures of centers, it seems reasonable to ask which is best to use. There are advantages
and disadvantages to each of them, depending on the nature of the data set. These are listed below.
Measure Advantages Disadvantages
Mean Easy to Compute
Sample Means tend to Vary Less
Good properties as sample size increases
(more to come on that later)
Sensitive to extreme values (outliers)
Median Resistant to outlying values
Good for skewed data (see below)
Harder to calculate
Less useful than the mean for inference
(more to come on that later)
Mode Easy to compute
Good for qualitative (categorical) data
Not very useful for quantitative data
Skewness
Using the mean, median, and mode together can help to describe the skewness of a data set. A data set is
considered skewed if the values extend more to one side of the distribution than the other. (Schuetter, 2007)

VARIANCE
The variance ( S2
) is the average squared deviation from the mean. It is also known as the square of the standard
deviation. Both measures are interchangeable. These means that the standard deviation is the square root of the variance.
The defining formula is
S2
=
∑(x−m)2
N−1
Where: x is each individual score making up the distribution
M is the mean of the distribution
N is the number of scores.
This is illustrated below
Example 1.8 calculation of variance
Calculation of a variance
x x2
x-m, (x-m)2
3 9 -2 4
5 25 0 0
2 4 -3 9
7 49 2 4
9 81 4 16
4 16 -1 1
∑ 30 184 34
M = 30/6 = 5.0
S2 =
34/5 = 6.8
(Keronanton, 2004)
Standard Deviation
Though the variance is frequently used as a measure of spread in certain statistical calculations, it does have
the disadvantage of being expressed in units different from those of those of the summarized data. Which means the
expressed units is going to be far smaller than the data. However the variance can be easily converted into a measure
of spread expressed in the same unit of measurement as the original scores: the standard deviation(s). It should be
noted that Standard deviation indicate the fluctuation of the variables around their mean. To convert from variance to
the standard deviation simply find the square root of the variance. It is the most popular measure of spread. The
formula for the standard deviation is given below.
1)-(N
N
)x(
-x
=S
2
2 


Example 1.8 calculation of standard deviation
Find the standard deviation of example 1.7
√6.8 = 2.61
Example 1.9
Find the standard deviation of the following distributed values
Times
score
6 3 5 5 2 7 6 4 3 2
Mean
score
4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3
To get the standard deviation, subtract the mean from each of the scores, square the deviation, and then add up the
squared deviations. This process is outlined below.
Time Scores Mean Score - mean (Score - mean)
6 4.3 1.7 2.89
3 4.3 -1.3 1.69
5 4.3 0.7 0.49
5 4.3 0.7 0.49
2 4.3 -2.3 5.29
7 4.3 2.7 7.29
6 4.3 1.7 2.89
4 4.3 -0.3 0.09
3 4.3 -1.3 1.69
2 4.3 -2.3 5.29
Total = 28.10
Therefore, the standard deviation becomes: 1.77=3.12=
9
28.10
=S
Grouped Data:
Often, data will be reported in terms of grouped observation and the calculation of the standard is obtained by a slightly
different formula, which is easier to apply in this situation. An example is presented below:
Example 1.10 Ages f
51 - 60 3
41 - 50 10
31 - 40 15

21 - 30 11
11 - 20 5
To calculate the mean and standard deviation of a grouped data, you must determine the midpoint for each of
the groups of observations. (Panneerselvam, 2008) Adding the upper and lower scores for each interval and dividing
by two can obtain the midpoints. For example, for the first group of data the midpoint would be (51 + 60)/2 = 111/2 =
55.5. I have redrawn the data below with the midpoints inserted. Also, I have included in the redrawn data a column
headed by the term fxMidpoint, which is simply the midpoint multiplied by the frequency
Ages f Midpoints fxMidpoint
51 - 60 3 55.5 166.5
41 - 50 10 45.5 455.0
31 - 40 15 35.5 532.5
21 - 30 11 25.5 280.5
11 - 20 5 15.5 77.5
Total = 1512.0
The mean becomes the sum of the scores in the fxMidpoint column divided by the sample size. The sample size can
be determined by adding the f column (N = 44). Therefore, the mean = 1512.0/44 = 34.36 (rounded to 34.4).To
determine the standard deviation there is a need to add one additional column to the table of calculations above. This
additional column is fxmidpoint2
and I have redrawn our table below with the added column.
Ages f Midpoints fxMidpoint fxMidpoint2
51 - 60 3 55.5 166.5 9240.75
41 - 50 10 45.5 455.0 20702.50
31 - 40 15 35.5 532.5 18903.75
21 - 30 11 25.5 280.5 7152.75
11.04=121.93=
43
51957.82-57201
=
43
44
01512.
-57201.0
=S
2

Chapter Exercises
1. A magazine is interested in expanding its readership to "yuppies," defined as people between the ages of
30 to 40. Following are the ages of a random selection of the magazine's readership, is there any reason to be
concerned? Draw the theoretical distribution and contrast the actual score distribution with the theoretical
distribution. Is the sample adequate?
23, 31, 29, 21, 25, 27, 25, 21, 29, 30, 35, 41, 23, 35,
19, 20, 26, 24, 26, 25, 28, 27, 51, 15, 28, 21, 23, 25
2. A more extensive examination of the readership was undertaken after the initiation of a one-year advertising
program, designed to increase the readership age range. Following are the data collected from this more
extensive study. Draw the theoretical distribution of the ages of the readers. What do you conclude? Is the
sample adequate?
Ages Frequency
20 - 24 26
25 - 29 45
30 - 34 87
35 - 39 30
40 - 44 8
45 - 49 4
References
Eboh, E. (2009). Social and economic Research Principles and Method. Enugu: African institute for applied
method.
Keronanton, A. (2004). Statistics: Median, Mode and Frequency Distribution. In A. Keronanton. Dublin: Dublin
Institution of Technology.
Obadan, M. I. (2012). research porcess,report writing and referencing. Ugbowo,Benin City,Nigeria: Goldmark
Press Limited.
Panneerselvam, R. (2008). Research Method. New delhi: Prentice hall of india limited.
Schuetter, J. (2007). Chapter 1. In J. Schuetter, measures of dispersation (pp. 45-54).
Walonick, D. S. (1993). The Reseach Process. minneapolis.

Empirics of standard deviation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Empirics of standard deviation

Similaire à Empirics of standard deviation (20)

Dernier

Dernier (20)

Empirics of standard deviation