2. Definition:
• Constructing frequency distribution of raw data is the first step towards
condensation of large data into compact form.
• It is necessary to condense the data into a single value. Such a single value is
called an average.
3. Definition:
• In most of the data the average is a centre of concentration of the values in
the data therefore, the average is called a measure of central tendency.
• The central tendency is stated as the statistical measure that represents the
single value of the entire distribution or a dataset. It aims to provide an
accurate description of the entire data in the distribution.
4. Properties of a Central Tendency:
• It should be rigidly defined
• Its computation should be based on all observations
• It should lend itself for algebraic treatment
• It should be least affected by extreme observations.
5. The following are the different measures of central
tendency:
1. Average ( Arithmetic mean)
2. Median
3. Mode
4. Quartiles
5. Geometric Mean
6. Harmonic Mean
7. Weighted Mean
6. Average/ Arithmetic Mean(AM):
• This is commonly used. Arithmetic mean (AM) or mean is a sum of all
observations divided by number of observations.
Computation for Ungrouped data:
The mean of n observations X1, X2……..Xn is given by
A.M = X1 + X2 + ……. +Xn
n
= Sum of observations/ Number of observations
7. Notation form :
• 𝐴𝑀 =
𝑥
𝑛
=
𝑠𝑢𝑚𝑥
𝑛
• Arithmetic Mean is denoted by 𝑥. The notation ∑ is read as sigma and 𝑥 as X
bar.
8. Merits Of AM:
• Merits:
1. It is easy to calculate and understand
2. It is based on all observations
3. It is familiar to common man and rigidly defined
4. It is capable of further mathematical treatment
5. It is least affected by sampling fluctuations. Hence it is more stable.
9. Demerits Of AM
• It is used only for quantitative data.
• It is unduly affected by extreme observations
• It cannot be calculated when the frequency distribution is with open end
classes.
• It cannot be determined graphically
• Sometimes AM may not be an observation in a data
10. Example:
Q. Obtain the arithmetic mean of marks scored by a student in 8 unit tests of II
MBBS Class.
58 62 67 65 68 70 69 61
12. Short cut or Assumed mean method:
• When observations in data set are large in size, it is a laborious work to find
mean. To avoid this difficulty, short cut method is adopted.
• Assume arbitrary mean i.e., an value from data set (which will simplify the
calculations) and subtract this assumed mean from each observation.
• We get what is known as differences or deviations.
• Obtain mean for deviations by usual method.
13. Contd….
• Observations
• Original data: X1, X2, ………Xn
• Differences or X1-a, X2-a, …….. Xn-a
• Deviations: d1, d2,…..dn
• Where a is any value from dataset.
• Mean for deviations(d) = sum d/n. Thus, Mean of original data(X)=a+d
14. Example:
• In a series of 10 postmorterms following observations regarding weight (in
gms) of liver were found.
• 1420 1405 1425 1410 1415
1435 1430 1415 1445 1430
16. Computation of grouped data
• In Statistics, data plays a vital role in estimating the different types of parameters. To
draw any conclusions from the given data, first, we need to arrange the data in such a
way that one can perform suitable statistical experiments. We know that data can be
grouped into two ways, namely, Discrete and Continuous frequency distribution.
17. Discrete frequency distribution:
• Suppose we have X1, X2, …….. Xn observations with corresponding
frequencies f1, f2,…..,fn. The AM is defined as
• 𝑥 =
𝑓1𝑥1+𝑓2𝑥2+⋯+𝑓𝑛𝑥𝑛
𝑓1+𝑓2
+…+𝑓𝑛
• In notation form, we have
• MeanX= ∑(f.x)/ ∑f
= ∑(f.x)/N
= Sum (Frequency×observation)
• Total Frequency
18. Calculate the average number of children per
family from the following data:
NO: of children No: of families
0 30
1 52
2 60
3 65
4 18
5 10
6 05
19. Solution:
NO: of Children
(X)
NO: of families
(f)
Total NO: of Children
(f.x)
0 30 0×30=0
1 52 1×52=52
2 60 2×60=120
3 65 3×65=195
4 18 4×18=72
5 10 5×10=50
6 5 6×5=30
Total 240 519
21. Continuous frequency distribution:
• In continuous frequency distribution, the frequency is not associated with
any specified single value but spread over entire class.
• It creates difficulty for finding mid values X1, X2,….,Xn. To overcome this
difficulty, we make a reasonable assumption that the frequency is associated
with mid-value of class, or the frequency is distributed uniformly over the
entire class.
• Mean (X) = Sum(f.x)/ Sum(f)
22. The following are different steps to calculate average
for continuous frequency distribution
• Step 1- Write all class intervals serially in the first column and corresponding
frequency in the second column.
• Step 2- The mid values of each class interval are obtained by adding lower
and upper class interval and dividing resultant quantity by 2 and put these
values in third column.
• Step 3- Multiply each ‘f’ by corresponding X and write this product in fourth
column. The addition of this column gives sum(fx). i.e ∑f.x.
23. Notation form:
• X= Sum of fourth column
Sum of second column
= Sum (f.x)
Sum(f)
24. Example:
• Find the average age (in years) at the time of death in city A.
Age Interval NO: of Deaths
0-10 16
10-20 09
20-30 20
30-40 11
40-50 07
50-60 12
60-70 09
70-80 04
80-90 02
27. 2. MEDIAN
• The mean is unduly affected by extreme observations and cannot be
calculated for distribution with open end class and qualitative variables like
honesty, sex, religion etc.
• To overcome these drawbacks, we use other measures of central tendency
like median.
28. Definition:
• When all the observations of a variable are arranged in either ascending or
descending order, the middle observation is known as median. It divides the
whole data into two equal portions.
• In other words, 50% of the observations will be smaller than the median
while 50% of the observations will be larger than it.
29. Merits:
• It is easy to understand and easy to calculate
• It can be computed for a distribution with open end classes.
• It is not affected due to extreme observations
• It is applicable for quantitative as well as qualitative data.
• It can be determined graphically.
30. Demerits:
• It is not based on all the observations, hence it is not proper representative.
• It is not as rigidly defined as the arithmetic mean.
• It is not capable of further mathematical treatment.
31. Computation of Median:
Ungrouped Data:
• As discussed above, the median is one of the measures of central tendency,
which gives the middle value of the given data set.
• While finding the median of the ungrouped data, first arrange the given data
in ascending order, and then find the median value.
32. • If the total number of observations (n) is odd, then the median is (n+1)/2 th
observation.
• If the total number of observations (n) is even, then the median will be average of
n/2th and the (n/2)+1 th observation.
33. Example:
For example, 6, 4, 7, 3 and 2 is the given data set.
• To find the median of the given dataset, arrange it in ascending order.
• Therefore, the dataset is 2, 3, 4, 6 and 7.
• In this case, the number of observations is odd. (i.e) n= 5
• Hence, median = (n+1)/2 th observation.
• Median = (5+1)/2 = 6/2 = 3rd observation.
• Therefore, the median of the given dataset is 4
34. Calculation for grouped data
• In a grouped data, it is not possible to find the median for the given observation by
looking at the cumulative frequencies. The middle value of the given data will be in
some class interval. So, it is necessary to find the value inside the class interval that
divides the whole distribution into two halves.
• we have to find the median class.
• To find the median class, we have to find the cumulative frequencies of all the classes
and n/2. After that, locate the class whose cumulative frequency is greater than (nearest
to) n/2. The class is called the median class.
37. Solution:
• To find the median height, first, we need to find the class intervals and their corresponding frequencies.
• The given distribution is in the form of being less than type,145, 150 …and 165 gives the upper limit. Thus,
the class should be below 140, 140-145, 145-150, 150-155, 155-160 and 160-165.
• From the given distribution, it is observed that,
• 4 girls are below 140. Therefore, the frequency of class intervals below 140 is 4.
• 11 girls are there with heights less than 145, and 4 girls with height less than 140
• Hence, the frequency distribution for the class interval 140-145 = 11-4 = 7
• Likewise, the frequency of 145 -150= 29 – 11 = 18
• Frequency of 150-155 = 40-29 = 11
• Frequency of 155 – 160 = 46-40 = 6
• Frequency of 160-165 = 51-46 = 5
38. Therefore, the frequency distribution table along
with the cumulative frequencies are given below:
Class Intervals Frequency Cumulative Frequency
Below 140 4 4
140 – 145 7 11
145 – 150 18 29
150 – 155 11 40
155 – 160 6 46
160 – 165 5 51
39. Contd….
• Here, n= 51.
• Therefore, n/2 = 51/2 = 25.5
• Thus, the observations lie between the class interval 145-150, which is called the
median class.
• Therefore,
• Lower class limit = 145
• Class size, h = 5
• Frequency of the median class, f = 18
• Cumulative frequency of the class preceding the median class, cf = 11.
40. • Now, substituting the values in the formula, we get
• Median=145+(25.5−11/18)×5
• Median = 145 + (72.5/18)
• Median = 145 + 4.03
• Median = 149.03.
• Therefore, the median height for the given data is 149. 03 cm.
42. MODE:
• In statistics, the mode is the value that is repeatedly occurring in a given set.
We can also say that the value or number in a data set, which has a high
frequency or appears more frequently, is called mode or modal value. It is
one of the three measures of central tendency, apart from mean and median.
For example, the mode of the set {3, 7, 8, 8, 9}, is 8. Therefore, for a finite
number of observations, we can easily find the mode. A set of values may
have one mode or more than one mode or no mode at all.
43. Definition:
• A mode is defined as the value that has a higher frequency in a given set of
values. It is the value that appears the most number of times.
• Example: In the given set of data: 2, 4, 5, 5, 6, 7, the mode of the data set is
5 since it has appeared in the set twice.
44. Bimodal, Trimodal & Multimodal (More than one
mode)
• When there are two modes in a data set, then the set is called bimodal
• For example, The mode of Set A = {2,2,2,3,4,4,5,5,5} is 2 and 5, because
both 2 and 5 is repeated three times in the given set.
• When there are three modes in a data set, then the set is called trimodal
• For example, the mode of set A = {2,2,2,3,4,4,5,5,5,7,8,8,8} is 2, 5 and 8
• When there are four or more modes in a data set, then the set is
called multimodal
45. Solution:
• The value occurring most frequently in a set of observations is its mode. In other words, the
mode of data is the observation having the highest frequency in a set of data. There is a
possibility that more than one observation has the same frequency, i.e. a data set could have
more than one mode. In such a case, the set of data is said to be multimodal.
• Let us look into an example to get a better insight.
• Example: The following table represents the number of wickets taken by a bowler in 10
matches. Find the mode of the given set of data.
•
• It can be seen that 2 wickets were taken by the bowler frequently in different matches.
Hence, the mode of the given data is 2.
46. Mode Formula For Grouped Data:
• In the case of grouped frequency distribution, calculation of mode just by
looking into the frequency is not possible. To determine the mode of data in
such cases we calculate the modal class. Mode lies inside the modal class. The
mode of data is given by the formula:
47. • Where,
• l = lower limit of the modal class
• h = size of the class interval
• f1 = frequency of the modal class
• f0 = frequency of the class preceding the modal class
• f2 = frequency of the class succeeding the modal class
48. Solution:
• Let us learn here how to find the mode of a given data with the help of examples.
Example 1: Find the mode of the given data set: 3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48.
Solution: In the following list of numbers,
3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48
15 is the mode since it is appearing more number of times in the set compared to other numbers.
Example 2: Find the mode of 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 data set.
Solution: Given: 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 is the data set.
As we know, a data set or set of values can have more than one mode if more than one value
occurs with equal frequency and number of time compared to the other values in the set.
Hence, here both the number 4 and 15 are modes of the set.
49. Example :
• In a class of 30 students marks obtained by students in mathematics out of
50 is tabulated as below. Calculate the mode of data given.
50. Solution:
• The maximum class frequency is 12 and the class interval corresponding to this
frequency is 20 – 30. Thus, the modal class is 20 – 30.
• Lower limit of the modal class (l) = 20
• Size of the class interval (h) = 10
• Frequency of the modal class (f1) = 12
• Frequency of the class preceding the modal class (f0) = 5
• Frequency of the class succeeding the modal class (f2)= 8
• Substituting these values in the formula we get;
51.
52. Standard Deviation
• The spread of statistical data is measured by the standard deviation.
Distribution measures the deviation of data from its mean or average
position. The degree of dispersion is computed by the method of estimating
the deviation of data points. It is denoted by the symbol, ‘σ’.
• The standard deviation is then defined as the positive square root of the
arithmetic mean of the squares of the deviations taken from the arithmetic
mean.
53. Merits of Standard Deviation
• It is rigidly defined
• It is based on all observations
• It does not ignore the algebraic signs of deviations
• It is capable of further mathematical treatment
• It is not much affected by sampling fluctuations.
54. Demerits of Standard Deviation
• It is difficult to understand and calculate
• It cannot be calculated for qualitative data and distribution with open end
classes.
• It is unduly affected due to extreme deviations.