This document discusses various statistical concepts including:
- The arithmetic mean is the sum of all values divided by the number of values. It is the most common measure of central tendency.
- Other measures of central tendency include the median and mode.
- Frequency distributions organize data into classes to simplify large datasets. Frequency, class intervals, and other terminology are defined.
- Data can be classified in various ways such as by quantitative or qualitative variables, geography, or time. Classification and tabulation organize large amounts of data.
2. DR.
Md.Khurshid
Alam
1
SL.NO CONTENTS PAGE SL.NO CONTENTS PAGE
1 BIO- STATISTICS:
Introduction
2 23 Chi-square (X2
) Test 76
2 Scale 5 24 ANOVA 80
3 Data, Questionnaire 7 25 Mann–Whitney U test 82
4 Measurement of central
tendencies
11 26 Kruskal-Wallis Test 85
5 Weighted average,
Measure of position
20 27 Moods median test 89
6 Measures of dispersion 22 28 Correlation 90
7 Standard deviation 24 29 Regression analysis 93
8 Graphic presentation of
frequency distribution
27 30 Minitab 95
9 Probability 32 31
SPSS
96
10 Estimation 38 32 Book, Journal,
Compendium
99
11 Research Methodology 39 33 Research Report: protocols
and Report Format
100
12 Research problem 43 34 Trend and possibilities of
research in Unani
103
13 Concept of variable 46 35 ICMR
STATEMENT
105
14 Research design 47 36 WHO-Researchers’
responsibilities
108
15 Types of control 52 37 WMA DECLARATION OF
HELSINKI
110
16 Blinding 53
17 Clinical research 54
18 Inductive and deductive
approaches
58
19 Sampling 59
20 Hypothesis 64
21 Z-test 67
22 T-test 72
3. DR.
Md.Khurshid
Alam
2
BIO- STATISTICS: Introduction
❖ Word STATISTICS is derived from Latin word STATUS means
STATE – a political state.
❖ BIO- STATISTICS is application of statistical tools and method in
biology and Medicine.
❖ John Graunt – is father of health statistics. In 1662 he published
“Natural and political observation made upon Bills of mortality”
Definition
❖ Statistics is science of data that will enable us to become
proficient data producer and efficient data users.
❖ In plural form, it stands for numerical facts (facts expressed in
numbers) pertaining to a collection of objects.
❖ In singular form, it stands for the science of collection,
organization, analysis and interpretation of numerical facts.
Prof. Horace Secrist defines
❖ By statistics we mean, aggregate of facts affected to a marked
extent by multiplicity of causes, numerically expressed,
enumerated or estimated according to reasonable standards of
accuracy, collected in a systemic manner for a pre-determined
purpose and place in relation to each other.
Branch of statistics
❖ Two branches –
1) Statistical methods
2) Applied statistics
Main branches of applied statistics are Biometry, Demography,
Econometrics, statistical quality control, Psychometry etc.
4. DR.
Md.Khurshid
Alam
3
Scope and applications of statistics
Statistics is considered to be a distinct branch of study applicable to
investigation in many branches of science. Statistical methods are
applied to specific problems in biology, medicine, agriculture,
commerce, business, economics, industry, sociology etc.
Function of statistics
❖ It simplifies complexity of the data.
❖ It reduces the bulk of the data.
❖ It adds precision to thinking.
❖ It helps in comparing different sets of figures.
❖ It guides in formulation of policies and helps in planning.
❖ It indicates trends and tendencies.
❖ It helps in studying relationship between different factors
Limitations of statistics
❖ Statistics does not deal with qualitative data.
❖ Statistics does not deal with individual fact.
❖ Statistical inferences (conclusions) are not exact.
❖ Statistics can be misused.
❖Common men cannot handle statistics properly.
Basic Notions
❖ Units or Individuals – the object whose characteristics are
studied.
❖ Population or Universe – the totality (collection) of units under
consideration.
❖ Finite population – population contains finite number of units.
5. DR.
Md.Khurshid
Alam
4
❖ Infinite population – Population contains infinite number of
units. E.g.- heights of plants.
❖ Census -If each and every units are studied; this type of study is
called complete enumeration or census.
❖ Quantitative characteristic: - A characteristic which is
numerically measurable.
❖ Qualitative characteristic: - A characteristic which is not
numerically measurable.
❖ Variable – A quantitative characteristics which varies from unit
to unit. E.g.- height
❖ Attribute – A qualitative characteristic which varies from unit in
unit. E.g.- sex
❖ Discrete variable – Some specified value in a given range. E.g. –
number of children per family.
❖ Continuous variable – A variable which assume all the value in
the range. E.g.- Hight of persons.
❖ Statistical survey or investigation – Study of variable which
show statistical (stochastic non-mathematical) variation.
❖ Investigator – who conduct statistical survey.
❖ Informants Respondents – A persons who supply information.
❖ Enumerators – An agent who collect and handover information
to the investigator.
❖ Sample – It is representative portion of the population
❖ Census enumeration – A survey in which the whole population
is made use.
6. DR.
Md.Khurshid
Alam
5
Collection and classification of data
A statistician is concerned with the study of variables which
show statistical (stochastic non mathematical) variation. Such a
study is called statistical investigation (statistical survey).
Investigator: -The person who conduct the statistical survey is
called investigator. The investigator plans the survey collect the
required data analyses them and finally draw conclusion.
Stages of statistical investigation
Mainly two stage-
I. Planning and preparation.
II. Executive of survey.
Execution has four steps, namely
1)collection of data
2)scrutiny, editing and presentation of data.
3)Analysis of data.
4) Interpretation of analyzed data.
Quality of data
❖“GIGO” Garbage in-garbage out. This means, researcher
must ensure high quality of data at every step.
Scale of measurement: - It is able to measure anything.
Measurement of magnitude. Basically, it is two type-
(1) Crude- it provides rough idea of magnitude.eg tall
(2) precise- it provides exact value of magnitude.eg 2cm
7. DR.
Md.Khurshid
Alam
6
• Nominal scale: - Also called classificatory scale. On the basis of
common property/share property/character it divides data into
sub groups. For example, if we divide 10people on their income
in high income, average income, low income, there is no
importance of order, either low income is written on top or
bottom.
• Ordinal scale: - It has all the property of nominal scale. It also
divides the under-study parameter into sub groups, into order,
Ascending or descending order. So, first divide the object on
nominal scale and then arrange in ordinal scale.
• Interval scale: -It has all the property of ordinal and nominal
scale in addition it places the sub group (rank sub-group) with a
definite interval.
The space between starting and terminating point is called
interval. This scale cannot used in mathematical calculations.
• Ratio scale: -It has all the property of nominal, ordinal and
interval scale in addition it is always start with zero.
Measurement of this is subject of mathematical calculation.
Every division is definite measurement. It is absolute scale. In
this zero is fixed. It most precise scale.
8. DR.
Md.Khurshid
Alam
7
Primary data are specially collected for a particular purpose. It is
reliable complete and fresh.
Method of collection of primary data: -
1. Direct personal interview – investigator personally comes
in contact with the unit.
2. Indirect personal interview – the investigator does not
contact the units directly but, he/she contacts person who
are in close association with units. These persons
(informants) supply information to the investigator.
3. Information through correspondents – the investigators
appoints his agents called correspondent at different place.
These correspondents collect required data in their area and
hand over to the investigator.
4. Method of questionnaire (mail inquiry) – questionnaire is
the list of question, answer for which are filled in by the
informants and these answers are required information for
the investigation.it is cheap, consumes less time and labour.
5. Method of schedule (collection through enumerators) –
schedule is the list of items on which the enumerators have
to collect and record information. It is filled by the
enumerators. These data are reliable and accurate. But in
this method, there is scope for bias.
General principle in drafting questionnaire(schedule)
1. The number of questions should be as less as possible.
2. Question should be short and simple.
3. If a lengthy question is unavoidable, it should be divided into two
or more parts.
4. Question should be such that answer to them are short.eg. Are
you married?
5. As far as possible, question regarding personal matter should be
avoided.
6. The question should be so framed that do not hurt the feeling of
the informants.
7. Question should not be ambiguous.
9. DR.
Md.Khurshid
Alam
8
8. Question should be logically arranged.
9. Any clarification, if necessary, regarding any of the question,
should be provided.
10.Question should be so framed that validity of information
supplied by informants can be cross checked.
A covering letter introducing the investigator and indicating the
purpose of survey should be attached to the questionnaire. It should
supply necessary instruction to the informants regarding return
SECONDRY DATA
❖ Primarily collected for some other purpose.
❖ It may not contain all required information.
❖ Sources of secondary data are—
1) published sources e.g. Gov. Reports
2) unpublished sources e.g.
Records of Govt.office
10. DR.
Md.Khurshid
Alam
9
Classification
❖ Units having common characteristics are grouped together.
❖ Each of these groups is called class.
❖ Simple or one-way classification – classification of units on the
basis of a single characteristic.
❖ Mani-fold classification – simultaneous classification of units on
the basis of two or more characteristics.
❖ Dichotomy- classification of units on the basis of a characteristic
into two classes.eg married and unmarried.
Function of classification
❖ Reduce the bulk of data.
❖ Simplifies the data.
❖Facilitates comparison of characteristics.
❖ Renders the data ready for statistical analysis.
TABULATION is a systemic arrangement of classified data in row and
columns of a table.
CONTIGENCY Table – A table showing many-fold classified data.
Types of classification- four types
1) Quantitative classification- classification with regard to variable.
2) Qualitative classification – classification with regard to attribute.
3) Spatial classification (Geographical classification).
4) Temporal classification (chronological classification)
classification with regard to time.
11. DR.
Md.Khurshid
Alam
10
Frequency table
❖ A systemic presentation of the values taken by a variable and the
corresponding frequencies is called frequency distribution of
that variable.
❖ A tabular presentation of frequency distribution is called
frequency table.
A frequency distribution in which class interval are considered is
a continuous(grouped) frequency distribution. If class interval is
not considered, it is a discrete(ungrouped) frequency
distribution.
Terms-
❖ Class interval: - it is range between upper- and lower-class limit.
It is width of the class.
❖ Lower class limit: - smallest value of class.
❖ Upper class limit: - highest value of class.
❖ Class mark or class mid value: -the central value (middle most
value) of a class interval.
❖ Inclusive class interval: - if class interval is such that the lower as
well as the upper-class limit are included in the same class
interval. Usually inclusive type of class interval is adopted when
the variable is discrete.
❖ Exclusive class interval: - If class interval is such that the lower-
class limit is included in the same class interval, whereas, the
upper-class limit is included in the succeeding class interval.
❖ Open end class interval: -Some time in frequency distribution,
the class interval at the extremities may not have one of the
limits. Such class interval is called open end class interval. E.g.
More than 100.
12. DR.
Md.Khurshid
Alam
11
❖ Frequency: - the number of observations in any class.
Bivariate and multivariate frequency distribution
Frequency distribution of a single variable is called univariate
frequency distribution. Frequency distribution of more than one
variable is called multivariate frequency distribution.
Bivariate = on two variables
Measurement of central tendencies
Central tendency
❖ Generally, in frequency distribution, the values cluster around a
central value.
❖ The property of concentration of the values around a central
value is called central tendency.
❖ The central values around which there is concentration is called
measure of central tendency (measure of location, average).
Five important measures of central tendencies-
1. Arithmetic Mean (A.M)
2. Median
3. Mode
4. Geometric Mean (G.M)
5. Harmonic mean (H.M)
Desired Qualities of an ideal measure of central tendencies –
1) It should be easy to understand.
2) Its computation procedure should be simple.
3) It should be rigidly defined.
13. DR.
Md.Khurshid
Alam
12
4) It should be based on all the values.
5) It should not be affected too much by abnormal extreme values.
6) It should be capable of further algebraic treatment so that it
could be used in further analysis of the data.
7) It should be stable. That is, the measure should be such that
sampling variation in the value of the measure should be least.
Arithmetic Mean (Mean)
❖ Arithmetic mean of a set of values is obtained by dividing the
sum of the values by the number of values in the set
❖ Arithmetic mean of the values -- X1, X2,..Xn is –
𝑋̄̅ =
𝑋1+𝑋2+⋯+𝑋𝑛
𝑛
=
∑ 𝑥
𝑛
❖ If the observation x1, x2, ……. Xn have frequencies f1 , f2 , …….. Fn ,
the Arithmetic mean is
𝑋̄̅ =
𝑓1𝑋1+𝑓2𝑋2+⋯+𝑓𝑛𝑋𝑛
𝑓1+𝑓2+ ….. +𝑓𝑛
=
∑ 𝑓𝑥
𝑁
(For discrete frequency distribution)
Where N = ∑ 𝑓 Is the total frequency
For raw data, the arithmetic mean is –
𝑋̄̅ =
∑ 𝑥
𝑛
For tabulated data (discrete or continuous), it is –
𝑋̄̅ =
∑ 𝑓𝑥
𝑁
14. DR.
Md.Khurshid
Alam
13
Change of origin and scale
❖ Let x1 , x2 , …….. Xn be n values.
Let ‘a’ be a constant.
Then x1- a, x2 –a,….xn – a are the value of x1 , x2 , ….. Xn with
origin shifted to ‘a’.
If ‘c’ is positive constant,
𝑥1 −𝑎
𝑐
𝑥2 −𝑎
𝑐
…….
𝑥𝑛 −𝑎
𝑐
Are the values x1 , x2 , ….. Xn with origin shifted to a and scale
changed by c.
Thus, u =
𝑥 −𝑎
𝑐
Therefore, x = a+cu
X – a = uc
X = a+uc
And so, X
̅ = 𝑎 + 𝑐𝑢̅
= a +
𝑐 ∑ 𝑓𝑢
𝑁
However, if c=1, X
̅ = 𝑎 + 𝑢̅
= a+
∑ 𝑢
𝑛
Properties of arithmetic Mean
1) Algebraic sum of the deviation of a set of values from their
arithmetic mean is zero.
That is, ∑(𝑥 − 𝑥)=0
2) Sum of the squared deviations of a set of values is a minimum
when deviation is taken around the arithmetic mean.
15. DR.
Md.Khurshid
Alam
14
Let x̅1 be the arithmetic mean of a set of n1 values. And, let x̅2,
be the arithmetic mean of another set of n2 values. Then, the
arithmetic means of the two set of values put together is
X̅=
𝑛1𝑥1+𝑛2𝑥
𝑛1+𝑛2
(combined arithmetic mean)
Merits of arithmetic mean
❖ It is rigidly defined.
❖ It can be easily computed.
❖ Logic behind its computation can be easily understood.
❖ It can be easily adopted for further statistical analysis.
❖ It is based on all the values.
❖ It is more stable than any other average.
❖ It can be calculated even when some of the values are equal to
zero or negative.
Demerits of arithmetic mean
❖ It is highly affected by abnormal extreme values.
❖ Since it is based on all the values, even if one of the values is
missing, it cannot be calculated.
❖ Sometimes, the arithmetic mean may be a value which is not
assumed by the variable.
16. DR.
Md.Khurshid
Alam
15
Median
❖ Median of a set of values is the middle most value when they are
arranged in the ascending order of magnitude. (such an
arrangement is called an array).
❖ It is a value that is greater than half of the values and lesser than
the remaining half.
❖ The median is denoted by M.
❖ In case of a raw data and also a discrete frequency distribution,
the median is –
M={
(𝑛+1)
2
}Th value in the arrayed series.
In the case of continuous frequency distribution, the median is –
M=l + [
𝑁
2
−𝑚) × 𝑐
𝑓
]
Where l: lower limit of the median class.
C: width of the median class.
F: frequency of the median class.
M: less than cumulative frequency up to l.
N: total frequency.
Merits of median
❖ The logic behind its computation is easily understood.
❖ It can be easily computed.
❖ Even if some extreme value is missing, it can be computed.
❖ It is not affected by abnormal extreme value.
❖ It can be used for the study of qualitative data also.
17. DR.
Md.Khurshid
Alam
16
Demerits of median
❖ It is not based on all the values.
❖ It cannot be used in deep statistical analysis.
Mode
❖ Mode is the value which has highest frequency.
❖ It is most frequently occurring value.
❖ It is denoted by Z.
❖ In case of raw data, and also in case of a discrete frequency
distribution, mode Is the value with highest frequency.
❖ In case of a continuous frequency distribution, mode is –
Z= 𝒍 + [
(𝒇−𝒇𝟏)×𝒄
𝟐𝒇−𝒇𝟏−𝒇𝟐
]
Where –
L: lower limit of the modal class.
F: frequency of the modal class.
C: width of the modal class.
F1: frequency of the class preceding the modal class.
F2: frequency of the class succeeding the modal class.
❖ Modal class is the class which contains the mode.
❖ Generally, modal class will be the class with highest frequency.
But sometimes, it may be a class other than the class with
18. DR.
Md.Khurshid
Alam
17
❖ Highest frequency.in such a situation, mode is obtained by using
the formula –
Z=l+ [
𝑐𝑓2
𝑓1+𝑓2
]
❖ Unimodal – most of the frequency distribution have only one
value with highest frequency, such frequency distribution is
unimodal. The have only mode.
❖ Multimodal – if there is more than one value with highest
frequency in frequency distribution.it will have more than one
mode.
❖ Bimodal – if there are two modes,
❖ A distribution which has more than one mode, it is said to be ill
defined.
Merits and demerits of mode
❖ Merits and demerits of mode are the same as merits and demerits
of median
❖ One additional demerit is –
For some frequency distribution, mode is ill defined.
19. DR.
Md.Khurshid
Alam
18
Geometric mean (GM): -the geometric mean of n value is the nth
root of product of the values. It is denoted by G.
The gematric mean of n values X1, X2, X3, ….Xn is
G= √𝑥1 × 𝑥2 × … × 𝑥𝑛
𝑛
If logarithms are used,
G=antilog[
∑ 𝑙𝑜𝑔 𝑥
𝑛
] For raw data
And G= antilog [
∑ 𝑓 𝑙𝑜𝑔 𝑥
𝑁
] For tabulated data
Geometric mean is the appropriate measure for averaging rate
of growth. This is the reason why geometric mean index number
is considered the best.
When any of the value is equal to zero, geometric mean is not
defined. Also, it not defined when some of the value are
negative. It is defined only when either all the value is positive
or all of them are negative.
Harmonic mean (H.M): - The harmonic mean of n value is the
reciprocal of the arithmetic mean of the reciprocals of the given
values. It is denoted by H.
Thus, harmonic mean of the value X1, X2, X3,…Xn is –
𝐻 =
𝑛
∑ (
1
𝑥
)
In case of tabulated data, H.M is-
𝐻 =
𝑁
∑(
𝑓
𝑥
)
20. DR.
Md.Khurshid
Alam
19
Uses of different averages: - the appropriate situation where
various average can be used.
Arithmetic mean-
1. The average is required for deep statistical analysis.
2. The variable is continuous.
3. The average is additive in nature.
Median-
1.The variable is discrete.
2. Some of the extreme value are missing.
3. There are abnormal extreme values.
4. Mode is ill defined.
5.The characteristic under study is qualitative.
Mode-
1. Modal value has very high frequency compared to other
frequency.
2. Some of the extreme value are missing.
3. The variable is discrete.
4. There is abnormal extreme value.
5. The characteristic under study is qualitative.
Geometric mean- The variable is multiplicative in nature. Average
rates and ratio have to be found.
Harmonic mean- The reciprocal of the variable is additive in nature. In
a slightly skew, distribution, the mean, median and mode show a
rough relation among themselves. It is -
Empirical relation between mean, median and mode: -
Mean – Mode =3(Mean - Median)
Mode = 3 Median – 2Mean
That is, Z = 3M – 2X¯
21. DR.
Md.Khurshid
Alam
20
Weighted average: - Sometime in the data, some of the item may
be more important than the other items. For example, in post
graduate of pathology Unani, the marks of ilmul alamat and ilmul
asbab carries greater importance than marks of histology, cytology,
etc. Thus, in such a situation an appropriate average with varying
weightage assigned to the values, is necessary. Such an average is
called weighted average. The weighted average in common use are
weighted arithmetic mean and weighted geometric mean.
Let the values x1, x2, x3, ….xn be assigned the weights w1, w2, w3,…wn
respectively. Then-
Weighted arithmetic mean is –
X¯ =
∑ 𝑊𝑋
∑ 𝑊
The weighted geometric mean is –
GW = antilog
∑ 𝑊 Log 𝑋
∑ 𝑊
Measure of position (Partition value): - The value which divide
the frequency distribution in definite ration are called partition value.
E.g. Median, Quartile, Decile, Percentile.
Quartiles: - Quartile divides the distribution into four quarters. For a
frequency distribution there are three quartiles.
Q1- the first quartile. It is also called lower quartile.it divide the value
which is greater than one quarter of the observation and less than the
remining three quarters.
Q2- it divides the value in two equal halves. It is same as median.
Q3 – it is called third quartile or upper quartile. The value divided by
it is greater than three quarter and lesser than remaining one quarter.
22. DR.
Md.Khurshid
Alam
21
For raw Data, and for a discrete distribution the rth quartile is –
Qr = {
𝑟(𝑛+1)
4
}Th
value in arrayed series.
For continuous frequency distribution, the rth quartile is –
Qr = 𝑙 + [
(
𝑟𝑁
4
−𝑚)×𝑐
𝑓
]
Decile: - there are nine decile for a frequency distribution, which are
denoted by D1, D2, D3,….D9. It divides the frequency distribution into
ten equal parts.
For continuous distribution –
Dr=𝑙 + [
(
𝑟𝑁
10
−𝑚)×𝑐
𝑓
]
Percentiles: -percentile divides the frequency distribution into
hundred equal parts. One percent of value exist between two
consecutive percentiles. It is denoted by P1 to P99. There are ninety-
nine percentile for a frequency distribution.
For continuous distribution –
Pr= 𝑙 + [
(
𝑟𝑁
100
−𝑚)×𝑐
𝑓
]
Median: -The median divides the frequency distribution into two
halves.
23. DR.
Md.Khurshid
Alam
22
Measures of dispersion
(Range, Quartile deviation, Mean deviation and standard deviation)
Dispersion (variation)
In a frequency distribution, though the values cluster around an
average, most of them differ from it. In some distribution, the
difference may be less, whereas in some other, it may be more. This
property of deviation of values from the average is called variation or
dispersion
Measures of dispersion
1) Range
2) Quartile deviation (semi- interquartile range, QD)
3) Mean deviation (M.D)
4) Standard deviation (S.D)
Among these four measures, standard deviation is the most
commonly used measure.
Essentials of good measure of variation
O It should be easy to understand.
O Its computation procedure should be simple.
O It should be rigidly defined.
O It should be based on all the values.
O It should not be affected too much by abnormal extreme values.
O It should be capable of further algebraic treatment.
O It should be stable.
24. DR.
Md.Khurshid
Alam
23
Range
O Range is the difference between highest and lowest values in the
data.
O If H is the highest value and L is the lowest value in the data, the
range of variation is –
R = H – L
Coefficient of range
A relative measure of variation which is used for comparison of
frequency distribution is coefficient of range. It is—
Coefft. Of R =
𝐻−𝐿
𝐻+𝐿
Range is easy in computation and very simple to understand.
Demerits of range
➢ Since range is based only on the extreme values, it shows
too much fluctuation.
➢ It is highly affected by abnormal extreme values.
➢ If data has abnormal extreme values, range should not be
adopted for study.
Quartile deviation
(semi- interquartile range)
➢ The quartile deviation is obtained by dividing the range
between the lower and upper quartiles by 2.
➢ If Q1 and Q3 are the lower and upper quartiles, the quartile
deviation is –
Q.D =
𝑄3−𝑄1
2
25. DR.
Md.Khurshid
Alam
24
Coefft. Of Q.D
Relative measure of variation based on the quartile is coefficient of
quartile deviation. It is –
Coefft. Of Q.D =
𝑄3−𝑄1
𝑄3+𝑄1
Feature of Q.D
➢ It is based only on the lower and upper quartiles.
➢ It can be easily computed.
➢ It is not affected much by extreme values.
➢ It is not based on all the values.
➢ It is not convenient for mathematical treatment.
Standard deviation
Standard deviation of a set of values is the positive square- root of
mean of the squared deviations of the values from their arithmetic
mean. It is denoted by sigma ( 𝜎 )
The range is based only on the lowest and highest values. Quartile
deviation is based only on quartiles. But these measures are not based
on all the values. And so, we consider standard deviation which is
based on all the values.
Standard deviation of the values x1, x2, x3,……..xn is ---
𝜎 = √∑(𝑥−𝑥)2
𝑛
In case of tabulated data (in both continuous and discrete)
𝜎 = √∑ 𝑓(𝑥−𝑥)2
𝑛
26. DR.
Md.Khurshid
Alam
25
Variance: -
The square of the standard deviation is called variance
Variance of x1, x2, x3,…. Xn is –
Var (x)=𝜎2
=
∑ 𝑓(𝑥−𝑥)2
𝑛
It is mean of squared deviation of the values from their
arithmetic mean.
Computation of standard deviation for raw data-
𝜎 = √
∑ 𝑥2
𝑛
− (
∑ 𝑥
𝑛
)2
Computation of standard deviation for tabulated data-
𝜎 = √
∑ 𝑓𝑥2
𝑁
− (
∑ 𝑓𝑥
𝑁
)2
If the origin of value is shifted to a and the scale is changed by c, that
is, if u=
𝑥−𝑎
𝑐
Then it can be shown that –
S.D(x)= c X S.D.(u)
Also, Var.(x)= c2 X var.(u)
Properties of standard deviation: -
S.D. is independent of origin of measurement, but not on scale.
S.D. is the least of all root-mean-squire deviation.
Combination of standard deviation of set of n1 and n2 values is –
𝜎 = √
𝑛1(𝜎1
2 + 𝑑1
2
) + 𝑛2(𝜎2
2 + 𝑑2
2
)
𝑛1 + 𝑛2
Where, d1= x1¯ −x¯, d2=x2¯ −x¯ and x¯=
𝑛1𝑥1¯+𝑛2𝑥2¯
𝑛1+𝑛2
27. DR.
Md.Khurshid
Alam
26
Coefficient of variation: -
Coefficient of variation is relative measure of variation. It is used for
comparing the variation in frequency distribution. It is the standard
deviation expressed as a percentage of the mean.
Thus, Coefficient of variation is-
C.V =
𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧
𝑨𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝒎𝒆𝒂𝒏
× 𝟏𝟎𝟎
=
𝝈
𝒙¯
× 𝟏𝟎𝟎
➢ A high value of Coefficient of variation indicates high degree of
variation, and
➢ A low value indicates low degree of variation.
➢ Coefficient of variation is independent of the unit of measurement
of the values,
Mean deviation
Mean deviation is the mean of absolute deviation of the values from
the central value.
Thus, mean deviation of the set of values x1, x2, x3,….xn from their
arithmetic mean is –
M.D.(X¯) =
∑|𝑥−𝑥¯|
𝑛
In case of tabulated data M.D. From A.M. is –
M.D.(X¯) =
∑ 𝑓|𝑥−𝑥¯|
𝑁
Mean deviation of the values from median M is –
M.D.(M) =
∑|𝑥−𝑀|
𝑛
In the case of tabulated data, M.D. from Median M is –
M.D.(M) =
∑ 𝑓|𝑥−𝑀|
𝑁
28. DR.
Md.Khurshid
Alam
27
❖ Here, |𝑥1 − 𝑥¯|, |𝑥2 − 𝑥¯|, |𝑥3 − 𝑥¯|,….. |𝑥𝑛 − 𝑥¯| Are the
deviations with the signs ignored. The signs are ignored because if
the deviations are algebraically added, the sum reduces to zero
(property 1 of A.M.).
❖ Mean deviation may be calculated around any average – mean,
median, mode, etc.
Minimal property of median – mean deviation is least when it is
measured from the median is called Minimal property of median.
Coefficient of mean deviation from the arithmetic mean is –
Coefft. Of M.D.(X¯) =
𝑀.𝐷.(𝑋¯)
𝑋¯
Coefficient of mean deviation from the Median is –
Coefft. Of M.D.(M)=
𝑀.𝐷.(𝑀)
𝑀¯
Graphic presentation of frequency distribution
A)Tabulation (simple and frequency distribution table)
B)Chart and diagram
A) Histogram
B) Frequency polygon
C) Frequency curve
D) Ogive (cumulative frequency)
E) Bar diagram
F) Pie diagram
G) Map diagram
F) Pictogram
29. DR.
Md.Khurshid
Alam
28
Histogram – Drawing procedure
❖ A histogram is simplest form of graphical presentation.
❖ Histogram for Equal class interval ---
Horizontal axis – which may not necessarily start from zero, is divided
by putting dots into equal parts numbering two or three more than
the number of class interval. Starting from left, each dot is then
labeled by the lower-class limits of the successive classes by leaving a
space of the size of one class interval at each end of the X-axis.
Sometimes, the horizontal also measure the mid-point of the
successive class interval.
❖ The vertical axis which always begins
with zero at the meeting point of the
two axes, is appropriately scaled to
measure class frequencies along it.
❖ Rectangular bars are then
constructed for successive class
intervals with their base on the X-axis,
such that the base is equal in width and
the height (on the Y-axis) equal to the
corresponding class frequency.
❖ The area of the bars so drawn
corresponding to each class interval is
given by its class frequency f multiplied
by the width of the class interval C.
Histogram - example
❖ weekly income of 80 salesman as constructed in table-
Income F1
No.
Salesman
50-59 6
60-69 9
70-79 15
80-89 25
90-99 13
100-109 7
110-119 5
30. DR.
Md.Khurshid
Alam
29
Histogram for unequal class interval
❖ Procedure of drawing histogram for unequal class interval is
slightly different.
❖ Minor adjustment is required to made in spacing of various dots
marked on the X-axis. For example, if a class interval is of a width
of 15 points and the rest of 5 points., the space on X-axis for the
class interval of 15 point should be three times longer than that
for an interval of 5 points.
❖ The vertical for such class intervals measures the frequency
density and not the original class frequency.
❖ The frequency density for a class interval of a width more than that
of the others is given to be the actual frequency of this class
divided by the number of times the width of this class of 15 points
width being 69, the frequency density of this class will be 69
divided by 3, that is 23.
❖ For drawing, histogram for an open-ended distribution, follow
usual procedure after leaving out the open-ended classes.
Frequency polygon
❖ Dot at the mid-point of top horizontal line of each bar and then
joining these dots by straight lines.
❖ Closed the polygon on each end by drawing straight lines from
the midpoint of the top base of the first and the last rectangle to
the mid-point falling on the horizontal axis of the next outlying
interval width zero frequency.
Drawing a frequency polygon does not necessarily require
constructing a histogram first
31. DR.
Md.Khurshid
Alam
30
OGIVE- Cumulative frequency curve
❖ Cumulative frequency curve is popularly known as ogive.
❖ The first step in drawing an ogive is to add another column of
cumulative frequencies, denoted as fc. This may be done by
finding cumulative frequencies either on a less than or more
basis.
❖ Less than cumulative frequency is obtained by adding successive
class frequencies from top to bottom.
❖ More than type cumulative frequency is obtained by adding up
successive class frequencies from bottom to top.
32. DR.
Md.Khurshid
Alam
31
Ogive drawing procedure
❖ Once cumulative frequencies are obtained, procedure is as usual.
The only difference being that the Y-axis now to be so scaled that
it accommodates the total frequencies. The X-axis is labeled with
the upper-class limits in the case of less than ogive, and the
lower-class limits in the case of more than ogive.
❖ Advantage of ogive is that these curves have quick adaptability
to interpretation.
Pie diagram
Total angle of a circle(pie) is 360 degree, and total area of circle is
100%.
Hence, each percent acquires 360
100
⁄ Degree
1% = 3.6 degree
Hence area under pie for a class frequency is =
𝑐𝑙𝑎𝑠𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑡𝑜𝑡𝑎𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑋̄360
Example- According to NCAER, New Delhi, forms of tobacco
consumption is as estimated by weight. Bidi-55%, cigrate-16% and
others 29%. Shown in pie diagram.
Bidis
55%
cigrate
16%
others
29%
Tobacco consuption
Bidis cigrate others
33. DR.
Md.Khurshid
Alam
32
Probability
Probability is the chance something will happen. In many instances,
we will have some knowledge about the possible outcomes of a
decision. In research we are unable to forecast the future with
complete uncertainty. Therefore, the need to cope with uncertainty
leads us to study and use of probability theory. Probability is part of
our everyday lives.
Probability is expressed as fractions {
1
5
,
1
6
,
1
15
}
Or as decimals (0.454, 0.475, 0.5669) between zero and one (0 - 1).
Zero probability means that something will never happen.
Probability of 1 (one) indicates that some thing definitely will happen.
Number between 0 (zero) to 1 (one) is probability, that is region
between certainty and uncertainty is probability.
The value of probability cannot be less than 0 or greater than 1.
Event: - in probability theory, an event is described as one or more of
the possible outcomes of doing something. E.g. In a coin toss
experiment, getting a tail would be an event, and getting a head would
be another event.
Experiment: - the activity that produce an event. E.g. Coin toss.
Sample space: -the set of all possible outcome of an experiment of an
experiment is called sample space. It is written as-
S= {𝒉𝒆𝒂𝒅, 𝒕𝒂𝒊𝒍} In coin toss experiment.
34. DR.
Md.Khurshid
Alam
33
Mutually exclusive events: - two or more events cannot occur at a
time, that means one and only one of events can takes place at a time.
E.g. In coin toss experiment either head or tail may turn up, but not
both.
Collectively exhaustive list of events: -list of events which include
every possible outcome.
Dependent event: - probability of occurrence of an event is
dependent on, or effected by in some way the occurrence of another
events.
Independent event: - probability of occurrence of an event has no
effect on the occurrence of another event.
Types (approach) of probability: -There are three basic type of
approach-
1. Classical approach
2. Relative frequency approach
3. Subjective approach.
Classical approach of probability: - Classical probability is also called
a priori probability. It is –
Probability of an event=
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝒘𝒉𝒆𝒓𝒆 𝒕𝒉𝒆 𝒆𝒗𝒆𝒏𝒕𝒔 𝒐𝒄𝒄𝒖𝒓𝒔
𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆
In coin toss experiment,
P(Head)=
𝟏(𝑯𝒆𝒂𝒅)
𝟐(𝑯𝒆𝒂𝒅+𝑻𝒂𝒊𝒍)
= 0.5
P(Tail)=
𝟏(𝑻𝒂𝒊𝒍)
𝟐(𝑯𝒆𝒂𝒅+𝑻𝒂𝒊𝒍)
= 0.5
Relative frequency approach of probability: - this method uses the
relative frequencies of past occurrences as probabilities. We
determine how often something has happened in the past and use
that figure to predict the probability that it will happen again in the
future. Life insurance companies are using this approach.
35. DR.
Md.Khurshid
Alam
34
Subjective approach: - it is based on the belief of the person making
the probability assessment. In 1926, Frank Ramsey in his book The
Foundation of Mathematical and Other Logical Essays introduce the
concept of Subjective approach of probability.
Laws of probability
i. Addition rules for mutually exclusive events: - probability of
either A or event B happening is written as –
P(A or B) = P(A)+P(B)
ii. Addition rules for NOT mutually exclusive events: -
If A and B is not mutually exclusive events, then-
P(A or B) = P(A)+P(B) – P(AB)
Where P(AB) is event where both event A and B occur together
at the same time.
iii. Multiplication law of probability: - this is applied when two or
more events occurs together but, they are independent of each
other.
Probability under condition of statistical independence- when two
events happen, occurrence of one event has no effect on the
probability of the occurrence of any other event.
There are three type of Probability under statistical independence-
1. Marginal Probability
2. Joint Probability –- P(AB)= P(A) x P(B)
3. Conditional Probability –
A) conditional Probability under statistical independence-
P(B/A) =P(B) or P(A/B)=P(A)
B) conditional Probability under statistical dependence. It is of three
types, conditional, joint and marginal.
36. DR.
Md.Khurshid
Alam
35
Probability distribution: -probability distribution is classified as either
continuous or discrete.
Continuous probability distribution: -if the variable under
consideration is allowed to take any value within a given range, so, we
cannot list all the possible values. E.g. Height of children.
Discrete Probability distribution: - Discrete Probability can take only
a limited number of values in a given range. It can be listed. E.g.
Number of children in a family.
Bernoulli distribution: - it was given by Jacob Bernoulli, a swiss
mathematician. The Bernoulli distribution describe discrete, not
continuous data, resulting from an experiment known as Bernoulli
process.
Uses of Bernoulli process-
1. Each experiment (trail) has a fixed number of possible outcomes.
In coin toss experiment, outcome is fixed (only two) either head
or tail. E.g. Success or failure; yes or no.
2. The probability of outcome of any trail remain fixed over time.
E.g. In fair coin toss experiment, the probability of head remains
0.5 for each toss regardless of the number of times the coin is
tossed.
3. The outcome of one experiment(trails) does not affect the
outcome of any other experiment. The experiment(trails) are
statistically independent.
The probability of r success in n trails is given as: ncr p rqn-r
= 𝒏!
𝒓!(𝒏−𝒓)!
Pr
qn-r
The mean of binomial distribution is given as 𝜇 = 𝑛𝑝
The standard deviation of binomial distribution is as 𝜎 = √𝑛𝑝𝑞
37. DR.
Md.Khurshid
Alam
36
➢ When p=0.5, the binomial distribution is symmetrical.
➢ When p > 0.5, the binomial distribution is skewed to the left.
➢ As p increases (0.3), the skewness is less noticeable.
➢ When p is small (0.1), the binomial distribution is skewed to the
right.
➢ The probability for 0.3, are same as those for 0.7. Except that the
value of p and q are reversed.
The Poisson distribution: it is based on previous data. Used for
discrete Probability distribution.
The Poisson probability formula is –
P(x)=
À𝒙𝒙𝒆−À
𝒙!
The Poisson distribution can be used instead of binomial distribution
to avoid tedious job of calculation in binomial, if n is larger and r is
small, that is when the number of trials is large and the binomial
probability of success is small. It gives good approximation of binomial
when n is greater than or equal to 20 and p is less than go 0.05.
38. DR.
Md.Khurshid
Alam
37
The normal probability distribution – continuous probability
distribution (Gaussian distribution): - in 18th century, Karl Gauss
postulate it.
-2 -1 0 1 2
➢ The curve is bell shaped; it has a single peak. It is unimodal.
➢ The mean of normally distributed papulation lies at the center of its
normal curve.
➢ Because of symmetry of the normal probability distribution the
mean, median and mode are the same value, at the center of the
curve.
➢ The two tail of the normal probability distribution extend
indefinitely and never touch the horizontal line.
➢ Total area under curve is 1.00, (probability).
➢ In normally distributed population-
❖ Approximately 68% of all the values lies within ±1
standard deviation from the mean.
❖ Approximately 95.5% of all the values lies within ±2
standard deviation from the mean.
❖ Approximately 99.7% of all the values lies within ±3
standard deviation from the mean.
39. DR.
Md.Khurshid
Alam
38
Estimation
Statistical inference is based on estimation and hypothesis testing. In
both estimation and hypothesis testing, we make inference about
characteristic of populations from information contained in sample.
Here we infer something about a population from information taken
from a sample. There are two type of estimation about population-
1) A point estimation- it a single number that is used to estimate an
unknown population parameter. A point estimation is more useful
if it is accompanied by an estimate of the error that might be
involved.
X¯ =
∑ 𝒙
𝒏
Thus, by using sample mean x¯ as the estimator we have a point
estimate of the population mean 𝜇.
Similarly, we can we can use the sample variance s2
and estimate
the population variance 𝜎2
. Where the sample variance s2
is given
by the formula- s2
=
∑(𝑥−𝑥¯)2
𝑛−1
2) An interval estimation- it is a range of value used to estimate a
population parameter. It indicates the error in two ways: By the
extent of its range and, By the probability of the true population
parameter lying within that range.
Actually, an interval estimate is a range of values within which a
papulation parameter is likely to lie.
❖ Interval estimate and confidence level: -the probability that
we associate with an interval estimate is called the
confidence level. A higher probability means more
confidence. In estimation, the most commonly used
confidence levels are 90%, 95%, and 99%, but we are free
to apply any confidence level. The confidence interval is the
range of estimate we are making.
Criteria for a good estimator
O Unbiasedness, Efficiency, consistency and sufficiency.
40. DR.
Md.Khurshid
Alam
39
Research Methodology
Research definition-
Research simply mean search for facts., answer to a question and
solution to a problem. It is a purposive investigation. It is an organised
enquiry. It seeks to find explanation of unexplained phenomenon to
clarify the doubtful facts and to correct the misconceived facts.
There are two type of method to search for facts-----
1) Aribitatory method (unscientific method) – it is based on
opinion, imagination, belief, impression etc. Its finding varies
person to person.
2) Scientific method – it is systemic and rational approach to
seeking facts.
Aim of research-
➢ Discover the new facts
➢ Verify or test the old facts.
➢ Develop new scientific tools, concept and theories.
Research is a scientific endeavour. It involves scientific method.
The scientific method is based on certain article of faith. These are-
• Reliance of Empirical evidence: - truth is established on the basis
of evidence.
• Use of relevant concept: - use concept with specific meaning.
• Commitment of objectivity: - objectivity is the hall mark of
scientific method.
• Ethical neutrality: -
• Generalization: -
• Verifiability: -
• Logical and reasoning process:
41. DR.
Md.Khurshid
Alam
40
Characteristic of research
➢ It is systemic and critical investigation into a phenomenon.
➢ It is a purposive investigation aiming at describing, interpreting
and explaining phenomenon.
➢ It adopts scientific methods.
➢ It is objective and logical.
➢ It is based upon observable experience or empirical evidence.
➢ It is directed towards finding answer.
➢ It emphasizes the development of generalisation, principle and
theories.
➢ It also stands up for test and criticism
Purpose of research
✓ Research extend knowledge of human beings.
✓ It explains undiscovered phenomenon.
✓ It verifies and test existing facts and theories
✓ It develops general laws.
✓ It analyses inter-relationship between variables.
✓ It derives casual explanation.
✓ Applied research aim at finding solution to a real-life problem.
✓ It also develops new tools, concepts and theories.
✓ It contributes in human development.
Types of research: -
A)According to intent—
1) Pure research/ basic/ fundamental research
2) Applied research
3) Exploratory research
4) Descriptive research
5) Diagnostic studies
6) Evaluation studies
7) Action research- it is a type of evaluation studies.
42. DR.
Md.Khurshid
Alam
41
B)According to method of studies-
1) Experimental research
2) Clinical research
3) Analytical studies
4) Historical research – it studies the past record and data
5) Survey
Research approaches (inquiry mode/scale of measurement)- two
type-
❖ Quantitative approach
❖ Qualitative approach
On the basis of application- two types
❖ Pure/ fundamental/Basic research
❖ Applied research
On the basis of objective – four type-
❖ Descriptive research
❖ Exploratory research
❖ Explanatory research
❖ Correlational research
1.Pure research/ basic/ fundamental research- it aims at extension of
knolledge.it is not necessary to be problem oriented.it may lead to
either discovery of a new theory or refinement of existing theory.it
lays foundation for applied research. Eg. Humoral theory of
Hippocrates, Einstein’s theory of relativity etc.
2.Applied research-it is real life problem-oriented and action directed
research. It seeks an immediate and practical result.
3. Descriptive research- it is simplest type of research. It is a fact-
finding investigation with adequate interpreatation.it describe
systemically a situation, phenomenon, problem, service or program, it
describes an attitude towards an issue.
43. DR.
Md.Khurshid
Alam
42
4.Exploratory research- it is also known as formulative research. This
type of study is undertaken with the objective either to explore an
area where little is known or investigate the possibilities of
undertaking a particular research study.
5.Explanatory research- it clarifies the relationship between two
aspect of a situation or phenomenon.
6.Correlational research – it discovers or establish the existence of
relationship /association/interdependence between two or more
aspect of a situation.
❖ Quantitative approach- it is structured/ rigid/predetermined
methodology to quantify extent of variation in to a phenomenon,
situation, issue, etc. It has reliability and objectivity.
❖ Qualitative approach – it is unstructured/ flexible/open
methodology to describe variation in a phenomenon, situation,
issue, etc. Emphasis on description of variable.
Hence, research is a scientific undertaking which, by mean of logical
and systemic technique, aim to discover new facts or verify and test
old facts, analyse the sequence, interrelationship and casual
explanation, develops new scientific tools, concept and theories which
would facilitate reliable and valid study of human behaviour. And it
also stand-up for the test of criticism.
Steps of research
➢ Formulating a research problem
➢ Research design conceptualisation
➢ Instrument construction for data collection
➢ Sampling
➢ Research proposal writing
➢ Data collection
➢ Data processing
➢ Research report writing
44. DR.
Md.Khurshid
Alam
43
Research problem
It is a difficulty or a problem demanding a solution with in the subject
is of his discipline. It is the first step in a scientific enquiry.
There are five components of a problem—
➢ Research consumer
➢ Research consumers objectives
➢ Alternative means to meet the objective
➢ Doubt in regard to selection of alternatives
➢ There must be one or more environments to which the difficulty
or problem pertains.
Selection of a problem: - it is first step in research. Selection is, Itself a
problem. One with a critical, curious and imaginative mind and is
sensitive to practical problem could easily identify problem for study.
Sources for Selection of a problem: -
• Review of literature
• Academic experience
• Daily experience
• Exposure to field situations
• Consultations brain storming
• Research
• Institution
Formulating the problem – it needs following criteria-
I. Internal criteria – it consist---
a) Researchers interest
b) Researchers competence
c) Researchers own resource.
45. DR.
Md.Khurshid
Alam
44
II. External criteria –
a) Research ability of the problem.
b) Importance and urgency
c) Novelty of problem
d) Feasibility
e) Facilities
f) Usefulness and social relevance
g) Research personal
Objective of formulating a problem – A problem well put is half
solved. The formulation serves the purpose. The clear and accurate
statement of the problem, the development of conceptual model, it
defines the objective of the study, the setting of investigative
question, the formulation of hypothesis to be tested and the
operation definition of concept and the delimitation of the study
determine the exact data needs of the study. It prevents wastage of
time and energy. It provides direction of study. It determine method
to be adopted.
Technique involve in formulating problem- it includes-
I. Developing title- It indicates core of study, reflect the real
intention of researcher. Title should be as long as it covers the
subject and as short as interest should retain.
II. Building a conceptual model- Conceptual model gives an exact
idea of the research problem and shows its various properties
and variable to be studied.
III. Defining the objective of study – It indicates what are trying to
get through study.
46. DR.
Md.Khurshid
Alam
45
Criteria of a good research problem-
1. Verifiable evidence: - other observer can see or check.
2. Accuracy: - it means truth or correctness of a statement.
3. Precision: - that is making it as exact as necessary.
4. Systematisation: - data should be collected systemic and
organised way.
5. Objectivity: - that is free being from all biases and vested
interest.
6. Recording: - that is writing down complete detail as
quickly as possible.
7. Controlling condition: - controlling all variable except
one.
8. Training investigators: - that is imparting necessary
knowledge to investigators.
Types of research question: - conceptualise that research study can
ask three type of question—
▪ Descriptive question – describe phenomenon or characteristics of
a particular group of subjects being study.
▪ Relationship question – investigate the degree to which two or
more variable are associated with each other.
▪ Difference question – make compression between or within
groups of interest.
A research question must identify –
▪ Variable under study
▪ Population being studied
▪ Testability of question
47. DR.
Md.Khurshid
Alam
46
Concept of variable
A variable is a characteristic, traits or attribute of a subject.
❖ Variable – A quantitative characteristics which varies from unit
to unit. E.g.- height
❖ Attribute – A qualitative characteristic which varies from unit in
unit. E.g.- sex
❖ Discrete variable – Some specified value in a given range. E.g. –
number of children per family.
❖ Continuous variable – A variable which assume all the value in
the range. E.g.- Hight of persons
Types of variable –
▪ Independent – any variable which is adopted for bringing change
(effect) is called independent variable.
▪ Dependent – the variable that change under the effect of another
variable is called dependent variable.
▪ Extraneous – the independent variable which is unwanted for
purpose of study but may affect the dependent variable is called
extraneous variable/ factor.
▪ Chance variable – it is also independent and unwanted variable
which may affect the dependant variable by chance.
48. DR.
Md.Khurshid
Alam
47
Research design
It is a logic and systemic plan prepared for directing a research study.it
specifies the objective of study, the methodology and technique to be
adopted for achieving the objective, it contributes the blue print for
the collection, measurement and analysis of data. A research design is
programme that guide the investigator in the process of collecting,
analysing and interpreting the observation.
According to cook – A research design is arrangement of
condition for collection and analysis of data in a manner that aims
to combine relevance to the research purpose with economy in
procedure.
Component of research design
1. Dependent and independent variable: -
Phenomena that assume different values quantitatively even
in decimal point are known as continuous variable, and
values that can be expressed only in integer value are called
non continuous variable.
2. Extraneous factor- the independent variable which are not directly
associated to the purpose but effect the dependent variable.
3. Control – the term control is used in experimental research to reflect
the restrain in experimental condition.it is used to minimise the
effect of extraneous independent variable.
4. Cofound relationship – the relationship between dependent and
independent variable is said to be confounded by an extraneous
variable, when the dependent variable is not free from its effect.
❖ Research hypothesis-it is the predictive statement which relates a
dependent variable and an independent variable.
❖ Control group- in experimental research, the group which is exposed
to usual condition is known as control group.
❖ Experimental group- the group which is either receive or exposed to
the intervention is called experimental group.
❖ Treatment- it is referred to the different condition to which the
experimental and control group is subjected to.
49. DR.
Md.Khurshid
Alam
48
Function of research design
❖ It relates to the identification and development of the procedure
and logistics arrangement of those procedure required for study.
❖ Emphasis on quality of the adopted procedure to ensure validity,
objectivity and accuracy.
Research design should have following information-
❖ Who will contribute to research population?
❖ Method of identification of research population.
❖ Where whole population will be studied or not? If not, then selection
of sample and method of sampling.
❖ Method of data collection with justification.
❖ How ethical issue will be addressed?
Different research design- (study is of different types; hence a single
research design is not suitable for all study.)
A)On the basis of number of contacts with the study population,
Research design three types-
1. Cross sectional study or prevalence study (one contact only): This
study is cross sectional to both the study population and time of
investigation.it is extremely simple design used to study the
prevalence of a phenomenon, situation or issue.it is easy and cheap
but change cannot measure by this study.
2. Before and after study (two contacts): - It is also known as pre-
test/post-test design. It can be described as two sets of cross-sectional
data collection point on the same population to find out the change in
the phenomenon or variable between two points of time.it can
measure change in a situation, phenomenon or issue.it is an
appropriate design for measure the impact or effectiveness of a
program. It may be either experimental or non-experimental.it is
more difficult, more expensive and require a longer time to
complete.it measure total change (change produce by both
independent variable and extraneous variable). Effect of this study
may be contaminated with maturation effect, reactive effect and
regression effect.
50. DR.
Md.Khurshid
Alam
49
3.Longitudinal study- (> 𝟐 𝒄𝒐𝒕𝒂𝒄𝒕𝒔): -In this study population
is visited a number of times at a regular interval, to collect the
required information. The number of intervals varies study to
study. Interval may be days, weeks, months or years, depends
upon study. Pattern of change can be studied by this method.
But maturation effect, reactive effect, regression effect and
conditioning effect can produce error in data.
B) On the basis of reference period, study design is of three types-
1. Retrospective study design: - This study is focus on the
problem or phenomenon which has happened in the past. Data
can be collected on the basis of recall of the situation. It is always
non experimental.
2. Prospective study design; -In this design study id done in
future.it may be experimental or non-experimental or semi
experimental.
3. Retro-prospective design: -This study focus on past trends in
a phenomenon and study in the future. This is combination of
retro and prospective studies.
C) On the basis of nature of investigation, study design is of three
types-
1. Non-experimental study
2. Experimental study
3. Semi experimental study
1.Non-experimental study- This is cause tracing study, that is it start
from the effect to trace the cause. Environment is not controlled in
this study.
2. Experimental study – It is cause and effect relationship study.it start
from cause to establish effect. An experimental study can be carried
out in either a controlled or a natural environment.
51. DR.
Md.Khurshid
Alam
50
Prof. Fisher has enumerated three principle of experimental design-
a) The principle of replication: - the experiment should be
repeated more than once. Thus, each treatment is applied in many
experimental units instead of one. By doing so, the statistical
accuracy of the experiments is increased.
b) The principle of randomisation: - it provides protection, when
we conduct an experiment, against the effect of extraneous factors
by randomisation.
c) Principle of local control: - this means that we should plan the
experiment in such a manner that we can perform a two-way
analysis of variance, in which the total variability of the data is
divided into three components attributed to treatment, the
extraneous factor and experimental error.
Types of experimental study design- there are so many types of
experimental study design, some of them which is used in medical
science and public health are-
❖ The after only design
❖ The before and after design
❖ The control group design
❖ The double control designs
❖ The comparative design
❖ The matched control experimental design
❖ The placebo designs
The after only design- In this design the baseline data (pre-test or
before observation) is constructed on the basis of respondents recall
of the situation before the intervention or from information available
in existing records. Only one set of data, after intervention is collected.
The change in dependent variable is measured by the difference
between the base line and after intervention data. This study
measure total change, including change attributed to extraneous
variable, hence, net effect of intervention cannot be identified by this
design. Due to improper baseline data to compare observation, the
two set of data are not strictly comparable. So, it is a faulty design for
measure the impact of an intervention.
52. DR.
Md.Khurshid
Alam
51
The before and after design- In this design a base line data is collected
before intervention and another set of data is collected after
intervention from population. Hence, data is comparable. This design
also measures the total change.
Effect (change in dependent variable) = effect of (intervention effect
+ extraneous effect + chance effect) – base line data.
The control group design- In experimental research, the term control
is used to refer the restrain experimental condition. This study is
design to minimise the effect of extraneous independent variable. In
this study a test article is compared with a treatment that has known
effect. The control may receive no treatment, or standard treatment
or placebo. The chief objective of control group is to quantify the
impact of extraneous variable. By this design the net effect of
intervention is measured.
Effect in dependent variable = (intervention effect+ extraneous
effect+ chance effect) – (effect in control group+ base line data).
Purpose of control study
➢ It helps in differentiating the result or outcome by the test
treatment from result caused by another factor(extraneous).
➢ It helps the investigator to know that what would happen to the
patient if they had not received the treatment.
➢ It provides sufficient evidence to prove the effectiveness of the
use of Unani medicine/ procedure in prevention, diagnosis or
treatment.
Control may consist-
❖ No treatment (plain control)
❖ Placebo treatment
❖ Well established treatment
❖ Different dose of same treatment
❖ Full scale treatment
❖ Minimal treatment
❖ Alternative treatment
53. DR.
Md.Khurshid
Alam
52
In an experimental hypothesis testing research when a group is
exposed to usual condition, it is termed control group. And the group
which is exposed to some novel/special condition, is termed as
experimental group.
Types of control
a) Plain control (no treatment in control group)- It is always
open that is patient and investigator both are not blind.
Eg. Effect of Hijamah in hair fall.
B) Placebo control-
Placebo is a pharmacological inert substance used in a clinical
trial. Placebo treatment is also called dummy treatment.
• Single blind control
• Double blind control (ideal)
• Triple blind control
C)Standard control-
D) Dose respondent control
E) External control – Subject receiving treatment are compared
with a group of patients external to the study.
F) Multiple control-more than one types of control groups.
Disadvantage-
➢ Ethical concern.
➢ In certain condition it may not be wise to withdraw the patient
from medication (Hypertention) and it may impose a serious
threat to the patient wellbeing.
Matching: - it is method for formation of comparable control and
experimental group to minimise the effect of extraneous variable.
Matching on minimum parameter is better. Matching may be –
A) Topographic f) Age matching
B) Same habit g) Sex matching
C)Same socioeconomic status
D)Dietary habit matching
E) Professional matching
54. DR.
Md.Khurshid
Alam
53
Blinding
Blinding is a method of control experimentation in which the subject
or researcher or both are not informed about the treatment given.
According to level of blinding, trials can be divided into following four type-
1.Unbliend study/open study- in this type of study both the patients
and the investigator (everybody involved in trial) are aware of the
identity of treatment given.
2.Single blind trials – in this type of study the patients are not aware
of the trial treatment being given to them, but their physician does
know about it.
3.Double blind trials - in this type of study neither patients nor
investigator knows which treatment on individual patient is given.
4.Triple blind trials – it is a double-blind study design involves
monitoring of the response by committee who is blind, is called triple
blind study. In this type, the patients, investigators and the data
analyst does not know, which treatment was being given since the
treatment may be coded. When trial reaches a predefined point, the
code is broken and trial is unblinded. This design gives an advantage
over double blind study because the monitoring committee can
evaluate the response in a more objective fashion.
Trial with Zemen’s design- in this design eligible individuals are
randomised before they given consent to participate in the trial, to
receive either a standard treatment or an experimental intervention.
Those who are allocated to standard treatment are given the standard
treatment and not told that they are part of a trial, where as those
who are allocated to the experimental intervention are offered the
experimental intervention and told that they are part of a trial. If they
refuse to participate in the trial, they are given the standard
intervention but are analysed as if they had received the experimental
intervention.
55. DR.
Md.Khurshid
Alam
54
Clinical research/ trial
Clinical research/ trial is systemic study of pharmaceutical product or
procedure on human subject. Clinical trial is a prospective study
comparing to the outcome of certain intervention against a control in
human subject. It may proceed from cause to effect or effect to cause.
Objective-
➢ To evaluate the safety and efficacy of Unani drug that is already
claimed by Unani physician.
➢ To develop new Unani drugs.
➢ To develop economical easily available Unani drugs.
➢ To develop new indication of Unani drug or to change dose
format or route of administration.
➢ Objective may be oriented on disease/drugs, procedure and
fundamental of science.
➢ Disease oriented objectives are-
❖ To study the aetiology.
❖ To study the pathogenesis.
❖ To study the clinical method.
❖ To study the principle of methods of treatment.
❖ To study the prognosis of disease.
❖ To study the complication of disease.
➢ Drugs oriented objectives are-
❖ To study safety and efficacy of Unani drugs.
❖ Clinical studies. Therapeutic trials of single and compound
drugs in different disease.
➢ Clinico-pharmacological studies-
❖ To validate various regime(cupping).
❖ To validate fundamental of Unani system of medicine.
❖ To develop parameter for mizaj assessment.
❖ To develops diagnostic tools based on Unani fundamentals.
❖ To validate asbabe sitta zarooria and gair zarooria.
➢ Clinical trials may be concerns with ilaz bil tadbeer, ilaz bil
gheza, surgical procedure, radiotherapy, or other alternative
approach.
56. DR.
Md.Khurshid
Alam
55
There are three elements of experimental design-
A) Control
B) Randomisation
C)Blinding
Phases of clinical trials: -there are four phase of clinical trials which
proceeds one from another.
❖ Phase I (Pharmacological phase)
• Always proceed by preclinical data and safety and efficacy
of test drug by study on animal subjects.
• First time human subjects (healthy volunteers) are exposed
with test drug.
• The purpose of trial is to find out toxicity, to calculate
maximum tolerance dose in human, to study
pharmacokinetics of drugs as per metabolism and
distribution.
❖ Phase II (Exploratory phase)
• It proceeds after successful phase I trials.
• It is conducted on patients.
• The purpose of trial is to find safety and efficacy, on
patients,
• To study pharmacodynamics pharmacokinetics of test drug.
• Informed consent is mandatory from subject.
❖ Phase III (Confirmatory phase)
• It proceeds after successful phase II trials.
• The purpose of trial is to Conclude safety and efficacy of
test drugs. Long term tolerance, different dose and regimes
and drug interaction are studied.
• Subjects are patients, hence Informed consent is mandatory
from subject.
57. DR.
Md.Khurshid
Alam
56
❖ Phase VI (Post marketing surveillance)
• It is not a clinical trial rather it is feedback or ADR
from market on patients.
• Delayed and rare effect may be reported from field.
• The effect on some special population or condition
may be reported.
• For this phase voluntary reporting and cohort study
method are adopted.
Case studies methods
Herbert spencer was the first social philosopher who used case study
in comparative studies of different culture. Several case studies were
mentioned by Zakaria Razi in their literatures.
Case studies is a method of exploring and analysing the life of a social
unit or entity, be it a person, a family, an institution or a community.
The aim of case study method is to locate or identify the factors that
account for the behaviour patterns of a given unit, and its relationship
with the environment. Case study is conducted for understanding,
exploring and interpreting of understudy issue for which little is
known.
Advantage of case study method- it provides an opportunity for the
intensive analysis of many specific details often overlooked by other
methods.
Disadvantage of case study method-. The case documents hardly fulfil
the criteria of reliability, adequacy and representativeness.
Case may be extremely typical or atypical.
58. DR.
Md.Khurshid
Alam
57
Case control study: - It is a retrospective study. This is first approach
to test casual hypothesis.
a) Both exposure and outcome (disease)have occurred
before the start of the study.
b) Study proceed from effect to cause (backward direction)
c) It uses a control or comparison group to support/ refuse
inference.
There are four basic steps in conducting a case control study-
1.Selection of case and control
2.Matching
3.Mesurement of exposure
4. Analysis and interpretation
Trend studies- it is most appropriate method of investigation to map
change over a period.
Cohort study
Cohort is defined as a group of people(units)who share a common
characteristic or experience within a definite time period (age,
occupation, pregnancy etc).
Cohort study is a type of analytical (observational)study which is
usually undertaken to obtain additional evidence to refuse or support
the existence of an association between suspected cause and disease.
It is longitudinal and incidence study.
Action research:- it is carried out to identifies area of concern, develop
and test alternatives, and experiment with new approaches.
There are two tradition of action research-
(1) The British tradition (2) The American tradition
59. DR.
Md.Khurshid
Alam
58
Inductive and deductive approaches to research
(Inductive=Zuj se qul; Deductive= qul se zuj)
The main difference between inductive and deductive approaches to
research is that whilst a deductive approach is aimed and testing theory,
an inductive approach is concerned with the generation of new theory
emerging from the data.
A deductive approach usually begins with a hypothesis, whilst an inductive
approach will usually use research questions to narrow the scope of the study.
For deductive approaches the emphasis is generally on causality, whilst
for inductive approaches the aim is usually focused on exploring new
phenomena or looking at previously researched phenomena from a
different perspective.
Inductive approaches are generally associated with qualitative
research, whilst deductive approaches are more commonly associated
with quantitative research. However, there are no set rules and some
qualitative studies may have a deductive orientation.
One specific inductive approach that is frequently referred to in
research literature is grounded theory, pioneered by Glaser and Strauss.
This approach necessitates the researcher beginning with a completely
open mind without any preconceived ideas of what will be found. The
aim is to generate a new theory based on the data.
Once the data analysis has been completed the researcher must examine existing
theories in order to position their new theory within the discipline.
Grounded theory is not an approach to be used lightly. It requires
extensive and repeated sifting through the data and analysing and re-
analysing multiple times in order to identify new theory. It is an
approach best suited to research projects where there the phenomena to
be investigated has not been previously explored.
The most important point to bear in mind when considering whether to
use an inductive or deductive approach is firstly the purpose of your
research; and secondly the methods that are best suited to either test a
hypothesis, explore a new or emerging area within the discipline, or to
answer specific research questions.
60. DR.
Md.Khurshid
Alam
59
Sampling
A part of the population is known as sample. The method consisting
of the selection for study, a portion of the universe (population) with
a view to draw conclusion about the universe (population) is known
as sampling.
Sampling helps in time and cost saving.
A statistic is characteristic of a sample, it is denoted by using lower
case roman letter. E.g. Sample size is denoted by n, sample standard
deviation is denoted by s. Whereas parameter is characteristic of
population and it is denoted by Greek or capital letter.
E.g. Population size is denoted by N, population standard deviation is
denoted by 𝝈.
Advantage of sampling
➢ Limit the number of units for study. (Unit– the object whose
characteristics are studied)
➢ It makes study feasible in respect of budget, time and logistics.
Characteristics of a good sample
➢ Representativeness: a sample must be representative of the
population.
➢ Accuracy: an accurate sample is one which exactly represents the
population.
➢ Precision: Precision is measured by standard error.
➢ Size: a good sample must be adequate in size in order to be
reliable.
Types of sampling: - there are two generic type-
A)Random or probability sampling
B)Non-Random or Non-probability sampling
61. DR.
Md.Khurshid
Alam
60
A)Random or probability sampling: - It is based on theory of
probability. It provides a known non-zero chance of selection for
each population element. There are four method of random
sampling-
1. Simple random sampling
2. Systematic random sampling
3. Stratified random sampling
4. Cluster sampling
1.Simple random sampling: - This sampling technique gives each
element an equal and independent chance of being selected.
➢ Drawing sample numbers by using (a) lottery method, (b)a tables
of random numbers or (c) by using computer.
➢ This type of sampling is suited for a small homogeneous
population.
➢ This is one of the easiest methods.
2. Systematic random sampling: - In this sampling, elements are
selected from the population at a uniform interval that is measured in
time, order, or space. It is simpler than random sampling.
It ignores all elements.
3.Stratified random sampling: - In this method we divide the
population into relatively homogeneous groups, called strata. Then
we use one of the two approaches- Either we select at random from
each stratum a specified number of elements corresponding to the
proportion of that stratum in the population as a whole or we draw an
equal number of elements from each stratum and give weight to the
results according to the stratum’s proportion of total
population.(Hence there are two method of sampling – 1. Equal
allocation and 2. Proportional allocation.)
4.Cluster sampling: - In this method we divide the population into the
group or clusters, and then select a random sample of these clusters.
We assume that these individual clusters are representative of the
population as a whole. A well-designed cluster sampling procedure
62. DR.
Md.Khurshid
Alam
61
can produce a more precise sample at considerably less cost than that
of simple random sampling.
➢ Needs of randomisation- The process of assigning the study
subjects randomly to either the treatment or control group is
called randomisation.
❖ It is essential to control various known or even unknown
biases at the beginning of the trial and during the course of
trial. It is very helpful in achieving this objective.
❖ Randomisation always remove the bias influencing the
result.
❖ Randomisation allows for valid statistical interpretation of
raw data.
❖ It eliminates selection bias.
❖ It avoids systemic difference between groups.
❖ It produces comparable group.
B) Non-Random or Non probability sampling: - It is not based on the
theory of probability. This sampling does not provide a chance of
selection to each population element. This method of sampling is
simple, convenience and low cost.it may be classified in to-
1. Convenience or accidental sampling: - It means selecting
sample units in a just “hit and miss” fashion. It the cheapest,
simplest and not require any statistical expertise. But this is
highly biased because of researcher’s subjectivity.
2. Purposive or judgemental sampling: -this method means
deliberate selection of sample units that conform to some pre-
determined criteria. It may not be true representative of their
parent population.
3. Quota sampling: -This is a form of convenient sampling
involving selection of quota groups of accessible sampling
units by traits such as sex, age, social class. Etc.
4. Snow-ball sampling: -This is a method of building up a list or
a sample of a special population by using an initial set of its
63. DR.
Md.Khurshid
Alam
62
members as informants.it is useful for smaller population for
which no frame is readily available.
Sampling distribution
➢ If we take several samples from a population, the statistic we
would compute for each sample need not be the same and most
probably would vary from sample to sample.
➢ A probability distribution of all the possible means of the samples
is a distribution of the sample means. Statisticians call this a
sampling distribution of the mean.
➢ Standard deviation of the distribution of sample means to
describe a distribution of sample means is standard error of the
mean.
➢ The standard deviation of the distribution of sample proportions
is called standard error of the proportion.
➢ The standard deviation of the distribution of sample statistic is
known as the standard error of the statistic.
➢ The sampling distribution has a mean equal to the population
mean 𝝁𝒙¯ = 𝝁.
➢ The sampling distribution has a standard deviation (a standard
error) equal to the population standard deviation divided by the
square root of the sample size 𝝈𝒙¯ =
𝝈
√𝒏
Therefore, the standard error of the mean for an infinite
population is given by: 𝝈𝒙¯ =
𝝈
√𝒏
Where 𝜎 is the population standard deviation and n= sample size.
If the sample mean is standardised and is taken from a
normalised population then the standardised sample mean is
given by: -
Z =
𝒙¯−𝝁
𝝈
64. DR.
Md.Khurshid
Alam
63
The central limit theorem: - First, the mean of the sampling
distribution will equal the population mean regardless of the sample
size, even if the population is not normal. Second, as the sample size
increases, the sampling distribution of the mean will approach
normality, regardless of the shape of the population distribution. This
relationship between the shape of the population distribution and the
shape of the sampling distribution of the mean is called the central
limit theorem.
The relationship between sample size and standard error: -
The use of finite population multiplier in calculating the standard
error.
If the population size is known, i.e. If N is known then if
𝑛
𝑁
> 0.05
Then we have the following formula to calculate the standard error of
the mean-
𝝈𝒙¯ =
𝝈
√𝒏
X √
𝑵−𝒏
𝑵−𝟏
Here N=Size of population and n=size of sample;
The term √
𝑁−𝑛
𝑁−1
In above equation is finite population multiplier.
65. DR.
Md.Khurshid
Alam
64
Hypothesis= Hypo+thesis
(Hypo means under; thesis means research theory) A hypothesis is an
assumption about relation between variables.it is a tentative
explanation of the research problem or a gauss about the research
outcome.
Importance of hypothesis-
▪ It gives direction to research.
▪ Suggest new experiment and observation.
▪ It enables collecting relevant data and organising them
effectively.
▪ It prevents indiscriminate gathering of data.
Sources of hypothesis –
▪ Existing theories
▪ Finding of previous studies.
▪ Personal experience. Analogy.
Criteria for hypothesis construction – It is never formulated in the
form of question. Following criteria should be followed for hypothesis
construction—
➢ It should be empirically testable, whether it is right or wrong.
➢ It should be specific and precise.
➢ The statement of the hypothesis should not be contradictory.
➢ It should specify variables.
➢ It should describe only one issue,
➢ It must consider the experience of another researcher
Need of hypothesis –
✓ It provides definite point to the investigation
✓ It guides the direction of study
✓ It specifies source of data. It determines the data needs
✓ It determines the most appropriate technique for analysis
✓ It contributes to the development of theory.
66. DR.
Md.Khurshid
Alam
65
Characteristics of a good hypothesis
1. Conceptual clarity
2. Specificity
3. Empirically testable
4. Availability of techniques
5. Theoretical relevance
Types of hypothesis
➢ Null hypothesis(H0) and Alternative hypothesis (Ha)
➢ One tail or two tail hypotheses (directional vs non directional)
Alternative hypothesis is usually the one which wishes to prove and
the null hypothesis are one that wish to disprove. The null hypothesis
represents the hypothesis we are trying to reject, the alternative
hypothesis represents all other possibilities. Null hypothesis should
always be specific hypothesis, i.e it should not state about or
approximately a certain value.
Concept of hypothesis testing: -
A)The level of significance- it is very important concept in the
context of hypothesis testing. It is always some percentage
(usually 5%) which should be chosen with great care, thought and
region. 5% level of significance means researcher is willing to
take as much as 5%risk rejecting the null hypothesis when null
hypothesis happens to be true.
Type I and type II error –
Type I error- when ewe reject H0 when H0 is true. Type I error means
rejection of hypothesis which should be accepted.it is also called level
of significance of test. It is denoted by 𝛼 (alpha)
Type II error – we accept H0 when it is not true.it means accepting the
hypothesis which should has been rejected. It is denoted by 𝛽 (beta).
6.Consistency
7.Objectivity
8.Simplicity
67. DR.
Md.Khurshid
Alam
66
Two tailed or one tailed test
A one tail test should be used when we are to test, say, whether the
population mean is either lower than or higher than some
hypothesised value.
A two-tail test reject the null hypothesis if, say, the sample mean is
significantly higher or lower than the hypothesised value of the mean
of the population.
(When we accept a null hypothesis on the basis of sample
information, we are really saying that there is no statistical evidence
to reject it. We are not saying that the null hypothesis is true. The only
way to prove a null hypothesis is to know the population parameter.,
and that is not possible with sampling. Thus, we accept the null
hypothesis and behave as if it is true simply because we can find no
evidence to reject it).
The steps in processing in using a standardised scale in hypothesis
testing: -
1. Decide whether it is one tail or two tailed tests.
2. State the hypothesis.
3. Select a level of significance appropriate for the decision.
4. Decide which distribution (t or z) is appropriate and find the
critical values for the chosen level of significance from the
appropriate table.
5. Calculate the standard error of the sample statistic.
6. Use the standard error to convert the observed value of the sample
to the standardised value.
7. Sketch the distribution and mark the position of the standardised
sample value and the critical values of the test.
8. Compare the value of the standardised sample statistic with the
critical values for this test and interpret the result.
68. DR.
Md.Khurshid
Alam
67
STATISTIC
Parametric: - t-test, z-test, chi square test (F-test)
Non-parametric: - chi square test, fisher test, Mann–Whitney U test
Z-test
A Z-test is any statistical test for which the distribution of the test
statistic under the null hypothesis can be approximated by a normal
distribution. Because of the central limit theorem, many test statistics
are approximately normally distributed for large samples. For each
significance level, the Z-test has a single critical value (for example,
1.96 for 5% two tailed) which makes it more convenient than the
Student's t-test which has separate critical values for each sample size.
Therefore, many statistical tests can be conveniently performed as
approximate Z-tests if the sample size is large or the population
variance is known. If the population variance is unknown (and
therefore has to be estimated from the sample itself) and the sample
size is not large (n < 30), the Student's t-test may be more appropriate.
If T is a statistic that is approximately normally distributed under the
null hypothesis, the next step in performing a Z-test is to estimate the
expected value θ of T under the null hypothesis, and then obtain an
estimate s of the standard deviation of T. After that the standard score
Z = (T − θ) / s is calculated, from which one-tailed and two-tailed p-
values can be calculated as Φ(−Z) (for upper-tailed tests), Φ(Z) (for
lower-tailed tests) and 2Φ(−|Z|) (for two-tailed tests) where Φ is the
standard normal cumulative distribution function.
69. DR.
Md.Khurshid
Alam
68
Use in location testing
The term "Z-test" is often used to refer specifically to the one-sample
location test comparing the mean of a set of measurements to a given
constant when the sample variance is known. If the observed data X1,
..., Xn are (i) independent, (ii) have a common mean μ, and (iii) have a
common variance σ2
, then the sample average X has mean μ and
variance σ2
/ n.
The null hypothesis is that the mean value of X is a given number μ0.
We can use X as a test-statistic, rejecting the null hypothesis if X − μ0
is large.
To calculate the standardized statistic Z = (X − μ0) / s, we need to
either know or have an approximate value for σ2
, from which we can
calculate s2
= σ2
/ n. In some applications, σ2
is known, but this is
uncommon.
If the sample size is moderate or large, we can substitute the sample
variance for σ2
, giving a plug-in test. The resulting test will not be an
exact Z-test since the uncertainty in the sample variance is not
accounted for—however, it will be a good approximation unless the
sample size is small.
A t-test can be used to account for the uncertainty in the sample
variance when the data are exactly normal.
There is no universal constant at which the sample size is generally
considered large enough to justify use of the plug-in test. Typical rules
of thumb: the sample size should be 50 observations or more.
For large sample sizes, the t-test procedure gives almost identical p-
values as the Z-test procedure.
Other location tests that can be performed as Z-tests are the two-sample
location test and the paired difference test.
70. DR.
Md.Khurshid
Alam
69
Conditions
For the Z-test to be applicable, certain conditions must be met.
• Nuisance parameters should be known, or estimated with high
accuracy (an example of a nuisance parameter would be the
standard deviation in a one-sample location test). Z-tests focus on
a single parameter, and treat all other unknown parameters as
being fixed at their true values. In practice, due to Slutsky's
theorem, "plugging in" consistent estimates of nuisance
parameters can be justified. However, if the sample size is not
large enough for these estimates to be reasonably accurate, the Z-
test may not perform well.
• The test statistic should follow a normal distribution. Generally,
one appeals to the central limit theorem to justify assuming that a
test statistic varies normally. There is a great deal of statistical
research on the question of when a test statistic varies
approximately normally. If the variation of the test statistic is
strongly non-normal, a Z-test should not be used.
If estimates of nuisance parameters are plugged in as discussed above,
it is important to use estimates appropriate for the way the data were
sampled. In the special case of Z-tests for the one or two sample
location problem, the usual sample standard deviation is only
appropriate if the data were collected as an independent sample.
In some situations, it is possible to devise a test that properly accounts
for the variation in plug-in estimates of nuisance parameters. In the case
of one and two sample location problems, a t-test does this.
Example
Suppose that in a particular geographic region, the mean and standard
deviation of scores on a reading test are 100 points, and 12 points,
respectively. Our interest is in the scores of 55 students in a particular
school who received a mean score of 96. We can ask whether this mean
score is significantly lower than the regional mean—that is, are the
students in this school comparable to a simple random sample of 55
71. DR.
Md.Khurshid
Alam
70
students from the region as a whole, or are their scores surprisingly
low?
First calculate the standard error of the mean:
Where is the population standard deviation?
Next calculate the z-score, which is the distance from the sample mean
to the population mean in units of the standard error:
In this example, we treat the population mean and variance as
known, which would be appropriate if all students in the region
were tested. When population parameters are unknown, a t test
should be conducted instead.
The classroom mean score is 96, which is −2.47 standard error units
from the population mean of 100. Looking up the z-score in a table of
the standard normal distribution, we find that the probability of
observing a standard normal value below −2.47 is approximately 0.5 −
0.4932 = 0.0068. This is the one-sided p-value for the null hypothesis
that the 55 students are comparable to a simple random sample from
the population of all test-takers. The two-sided p-value is
approximately 0.014 (twice the one-sided p-value).
Another way of stating things is that with probability
1 − 0.014 = 0.986, a simple random sample of 55 students would have
a mean test score within 4 units of the population mean. We could also
say that with 98.6% confidence we reject the null hypothesis that the
55 test takers are comparable to a simple random sample from the
population of test-takers.
The Z-test tells us that the 55 students of interest have an unusually low
mean test score compared to most simple random samples of similar
size from the population of test-takers. A deficiency of this analysis is
that it does not consider whether the effect size of 4 points is
meaningful. If instead of a classroom, we considered a subregion
containing 900 students whose mean score was 99, nearly the same z-
72. DR.
Md.Khurshid
Alam
71
score and p-value would be observed. This shows that if the sample
size is large enough, very small differences from the null value can be
highly statistically significant. See statistical hypothesis testing for
further discussion of this issue.
Z-tests other than location tests
Location tests are the most familiar Z-tests. Another class of Z-tests
arises in maximum likelihood estimation of the parameters in a
parametric statistical model. Maximum likelihood estimates are
approximately normal under certain conditions, and their asymptotic
variance can be calculated in terms of the Fisher information. The
maximum likelihood estimate divided by its standard error can be used
as a test statistic for the null hypothesis that the population value of the
parameter equals zero. More generally, if is the maximum
likelihood estimate of a parameter θ, and θ0 is the value of θ under the
null hypothesis,
Can be used as a Z-test statistic.
When using a Z-test for maximum likelihood estimates, it is important
to be aware that the normal approximation may be poor if the sample
size is not sufficiently large. Although there is no simple, universal rule
stating how large the sample size must be to use a Z-test, simulation
can give a good idea as to whether a Z-test is appropriate in a given
situation.
Z-tests are employed whenever it can be argued that a test statistic
follows a normal distribution under the null hypothesis of interest.
Many non-parametric test statistics, such as U statistics, are
approximately normal for large enough sample sizes, and hence are
often performed as Z-tests.
73. DR.
Md.Khurshid
Alam
72
t-test
t-test:- in 1908s, theoretical work on t distribution was done by
W.S.Gosset. Gosset was was employ of Guinnness Brewery in Dublin,
Ireland. Guinnness Brewery did not permit employees to publish
research findings under their own names.so, Gosset adopted the pen
name student and published under the name. The t distribution is
commonly called students t distribution or simply students
distribution.
Conditions for use of t-test
➢ Sample size ≤ 30.
➢ Population standard deviation must be unknown.
➢ Population distribution should be normal or approxmatly normal.
➢ Random sample.
➢ Quantative data.
Degree of freedom- there is a different t distribution for each of the
possible degree of freedom. We will use degree of freedom when we
select a t distribution to estimate a population mean, and we will use
n-1 degree of freedom, where n is the sample size.
For example, if we use a sample of 22 to estimate a population mean,
we will use (n-1)=21 degree of freedom in order to select the
appropriate t distribution.
74. DR.
Md.Khurshid
Alam
73
t-test Calculation of srandard error of diffrence between
means
Small sample and uncorelated data
Ist step-calculate combined variance(SD2)-
SD2
=
(∑ 𝑋1−𝑋¯)2 𝑜𝑓 𝑔𝑟.1 + (∑ 𝑥2−𝑥¯)2 𝑜𝑓 𝑔𝑟.2
𝑁1 +𝑁2−2
Because, (𝑥1 − x¯) = x1 and (𝑥2 − x¯) = x2 ; hence we can write as
(x1 - x¯)2 = x1
2 and (𝑥2 − x¯)2 = x2
2
SD2
=
(∑ 𝑋1)2 + (∑ 𝑋2)2
𝑁1 +𝑁2−2
SD = √
(∑ 𝑋1)2 + (∑ 𝑋2)2
𝑁1 +𝑁2−2
2nd step- calculation for dtandard error of diffrence-
SED = √
SD12
N1
+
SD2 2
N2
OR SD = √
1
𝑁1
+
1
𝑁2
3rd step- calculation of t –
T =
𝒐𝒃𝒔𝒆𝒓𝒗𝒆𝒅 𝒅𝒊𝒇𝒇𝒓𝒆𝒏𝒄𝒆
𝑺𝑬𝑫
=
𝐗¯𝟏−𝐗¯𝟐
𝑺𝑬𝑫
4th step- D.F = N1+N2 - 2
75. DR.
Md.Khurshid
Alam
74
Using t – distribution table
Comparison between t and z tables
The table of t distribution values differs in construction from the z
tables. The t table is more compact and shows areas and t values for
only a few percentages (10, 5, 2, and 1 percent). Because there is a
different t distribution for each number of degrees of freedom, a more
complete table would be quite lengthy. Although we can conceive of
the need for a more complete table.
A second difference in the t table is that it does not focus on the
chance that the population parameter being estimated will fall within
our confidence interval. Instead, it measures the chance that the
population parameter we are estimating will not be within our
confidence interval (that is, that it will lie outside it).
If we are making an estimate at the 90% confidence level, we would
look in the t table under the 0.10 column
(100 percent − 90 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 = 10 𝑝𝑒𝑟𝑐𝑒𝑛𝑡)
This 0.10 chance of error is symbolised by a, which is the Greek letter 𝛼
(alpha). We would find the appropriate t values for confidence
intervals of 95%, 98%, and 99% under the column headed 0.05, 0.02,
and 0.01 respectively.
A third difference in using the t table is that we must specify the
degree of freedom with which we are dealing. Suppose we make an
estimate at the 90% confidence level with a sample size of 14, which
is 13 degree of freedom. Look under the 0.10 column until you
encounter the row labelled 13. Like a z value, the t value there of 1.771
shows that if we mark off plus and minus 1.7716
𝜎𝑥¯ (𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑥¯) on either side of the mean,
the areas under the curve between these two limits will be 90%, and
the area outside these limits (the chance of error ) will be 10 percent.
76. DR.
Md.Khurshid
Alam
75
Remember that the any estimation in which the sample size is 30 or
less and the standard deviation of the population is unknown and the
underlying population can be assumed to be normal or approximately
normal, we use the t distribution.
Determining the sample size(n) in Estimation: - in all above example
the sample size was known. Now we are trying to estimate the sample
size n. If it is too small, we may fail to achieve the objective, if it is too
large, we will be wasting resources. However, let’s try to examine
some of the methods that are useful in determining what sample is
necessary for any specified level of precision.
Comparison of two ways of expressing the same confidence limits
Lower confidence limit Upper confidence limit
A. X¯− 500 X¯+ 500
B. X¯− 𝑧 𝜎𝑥¯ X¯+ 𝑧 𝜎𝑥¯
C. X¯− t 𝜎𝑥¯ X¯+ t 𝜎𝑥¯
Example: - Department of TST, NIUM Banglore, wants to conduct a
survey of the annual earning of its Passed M.D, calculate appropriate
sample size for this study in order to estimate the mean annual
earnings of last year’s class within 500 at 95% level of confidence.
Solution: - in problem, it is stated that variation of 500 on either side
of the population mean. That means, 𝑧 𝜎𝑥¯= 500
At 95% level of confidence we know from the z table that z=1.96
Therefore, 1.96 𝜎𝑥¯= 500; and that means 𝜎𝑥¯= 500/1.96=255
Now if the standard error of the mean is 255; that leads us to
𝜎𝑥¯=
𝜎
√𝑛
= 255. Since 𝜎 = 1500 we can find n that is-
=
1500
√𝑛
=255 ; therefore, n= (
1500
255
)2 = 34.6
It means n should be greater than 34.6 or 35 if the NIUM wants to
estimate the precision with which it wants to conduct the survey.
77. DR.
Md.Khurshid
Alam
76
Chi-square (𝐗𝟐
) Test
Chi-square (X2
) Test enable us to test whether more than two
populations can be considered equal. (t-test and z-test are applicable
for only one / two sample).
Chi-square (X2
) Test Allow us to do a lot more than just test for the
equality of several population. Suppose we classify a population in to
several categories with respect to two attributes (such as age and job
performance), we can then use a Chi-square (X2
) Test to determine
whether the two attributes are independent of each other.
Characteristics of Chi-square (X2
) Test: -
➢ Chi-square (X2
) Test is based on frequencies and not on
parameter.
➢ It is a non-parametric test where no parameter regarding the
rigidity of papulation or papulations are required.
➢ Additive property is also found in Chi-square (X2
) Test
➢ Chi-square (X2
) Test is useful to test the hypothesis about the
independence of attributes.
➢ Chi-square (X2
) Test can be used in complex contingency tables.
➢ Chi-square (X2
) Test has very wide use.
Degree of freedom: - The number of degrees of freedom for n
observations is n-k and is usually denoted by v where k is the number
of independent linear constraints imposed upon them.
Suppose someone tells me to write any four numbers then I have all
the numbers of my choice. But if a restriction is applied or imposed to
the choice that the sum of these numbers should be 50; then the
freedom of choice would be reduced to three only and so, the degree
of freedom would now be 3. If a Chi-square (X2
) is defined as the sum
of the square of n independent standardised normal variates and the
condition of the satisfaction of k linear relations is imposed upon them
(such as estimation of some population parametric value etc.) Then
the effect of these n constrains of