SlideShare une entreprise Scribd logo
1  sur  22
Unit-4
Tests of Significance
Once sample data has been gathered through an observational study or experiment, statistical
inference allows analysts to assess evidence in favor or some claimabout the population from
which the sample has been drawn. The methods of inference used to support or reject claims
based on sample data are known as tests of significance.
Every test of significance begins with a null hypothesis H0. H0 represents a theory that has been
put forward, either because it is believed to be true or because it is to be used as a basis for
argument, but has not been proved.
For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is
not better on average, than the current drug. We would write H0: there is no difference
between the two drugs on average.
The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to
establish.
For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new
drug has a different effect, on average, compared to that of the current drug. We would write
Ha: the two drugs have different effects, on average. The alternative hypothesis might also be
that the new drug is better, on average, than the current drug. In this case we would write Ha:
the new drug is better than the current drug, on average.
The final conclusion once the test has been carried out is always given in terms of the null
hypothesis. We either "reject H0 in favor of Ha" or "do not reject H0"; we never conclude "reject
Ha", or even "accept Ha".
If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is
true, it only suggests that there is not sufficient evidence against H0 in favor of Ha; rejecting the
null hypothesis then, suggests that the alternative hypothesis may be true.
(Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Hypotheses are always stated in terms of population parameter, such as the mean 𝜇. An
alternative hypothesis may be one-sided or two-sided. A one-sided hypothesis claims that a
parameter is either larger or smaller than the value given by the null hypothesis. A two-sided
hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis
-- the direction does not matter.
Hypotheses for a one-sided test for a population mean take the following form:
H0: 𝜇 = k Ha: 𝜇 > k or H0: 𝜇 = k Ha: 𝜇 < k.
Hypotheses for a two-sided test for a population mean take the following form:
H0: 𝜇 = k
Ha: 𝜇 k.
A confidence interval gives an estimated range of values which is likely to include an unknown
population parameter, the estimated range being calculated from a given set of sample data.
(Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1)
Example
Suppose a test has been given to all high school students in a certain state. The mean test score
for the entire state is 70, with standard deviation equal to 10. Members of the school board
suspect that female students have a higher mean score on the test than male students, because
the mean score 𝑥̅ from a random sample of 64 female students is equal to 73. Does this provide
strong evidence that the overall mean for female students is higher?
The null hypothesis H0 claims that there is no difference between the mean score for female
students and the mean for the entire population, so that 𝜇 = 70. The alternative hypothesis
claims that the mean for female students is higher than the entire student populations mean,
so that 𝜇 > 70.
Types of errors:-
There are two types of error in testing of hypothesis.
When a statistical hypothesis is tested there are four types of possibilities arise
1. The hypothesis is true but our test rejects it. (Type- I error)
2. The hypothesis is false but our test accepts it. (Type-II error)
3. The hypothesis is true but our test accepts it. (Correct decision)
4. The hypothesis is false but our test rejects it. (Correct decision)
The first two possibility leads to errors.
In a statistical hypothesis testing experiment, a type-I error is committed by rejecting the null
hypothesis when it is true. The probability of committing a type-I error is denoted by 𝛼
(pronounced alpha), where
𝛼 = Prob. (Type- I error)
= Prob. (Rejecting 𝐻0/𝐻 𝑎 is true)
On the other head, a Type-II error is committed by not rejecting (i.e. accepting) the null
hypothesis when it is false. The probability of committing a type-II error is denoted by 𝛽
(pounced as beta), where
𝛽= Probability (Type-II error)
= Probability (Not rejecting or accepting 𝐻0/𝐻 𝑎 false)
The distinction between these two types of error can be made by an example.
Assume that the difference between the two population mean is actually zero. If our test of
significance when applied to the simple mean is significant, we make an Type- I error.
On the other hand, suppose there is true difference between the two population means. Now
our test of significance leads to the judgment “not significant”, we commit Type- II error, we
thus find ourselves in the situation which is described by the following table:
Hypothesis test
As we know sometimes we cannot survey or test all persons or objects; therefore, we have to
take a sample. From the results of analysis from the sample data, we can predict the results
from the population. Some questions that one may want to answer are
1. Are unmarried workers more likely to be absent from work than married workers?
2. In Fall 1996, did students in Math 163-01 score the same on the exam as students in
Math 163-02?
3. Is there any difference between the strengths of steel wire produced by the XY
Company and Bob’s Wire Company?
4. A hospital spokesperson claims that the average daily room charge for a specific
procedure is $622. Can we reject this claim?
Hypothesis testing is a procedure, based on sample evidence and probability theory, used to
determine whether the hypothesis is a reasonable statement and should not be rejected, or is
unreasonable and should be rejected.
Hypothesis test:- A statistical hypothesis test is a method of statistical inference used for
testing a statistical hypothesis. A test result is called statistically significant if it has been
predicted as unlikely to have occurred by chance alone, according to a threshold probability—
the significance level.
Steps in the hypothesis testing procedure
1. State the null hypothesis and the alternate hypothesis.
Null Hypothesis – statement about the value of a population parameter.
Alternate Hypothesis – statement that is accepted if evidence proves the null hypothesis to be
false.
2. Select the appropriate test statistic and level of significance. When testing a hypothesis of a
proportion, we use the z-statistic or z-test and the formula
𝑧 =
𝑝̂ − 𝑝
√
𝑝𝑞
𝑛
When testing a hypothesis of a mean, we use the z-statistic or we use the t-statistic according
to the following conditions.
If the population standard deviation, σ, is known and either the data is normally distributed or
the sample size n > 30, we use the normal distribution (z-statistic).
When the population standard deviation, σ, is unknown and either the data is normally
distributed or the sample size is greater than 30 (n > 30), we use the t-distribution (t-statistic).
A traditional guideline for choosing the level of significance is as follows: (a) the 0.10 level for
political polling, (b) the 0.05 level for consumer research projects, and (c) the 0.01 level for
quality assurance work.
3. State the decision rules. The decision rules state the conditions under which the null
hypothesis will be accepted or rejected. The critical value for the test-statistic is determined by
the level of significance. The critical value is the value that divides the non-reject region from
the reject region.
4. Compute the appropriate test statistic and make the decision. When we use the z-statistic,
we use the formula
𝑧 =
𝑥̅ − 𝜇
𝜎/√ 𝑛
When we use the t-statistic, we use the formula
𝑡 =
𝑥̅ − 𝜇
𝑠/√ 𝑛
Compare the computed test statistic with critical value. If the computed value is within the
rejection region(s), we reject the null hypothesis; otherwise, we do not reject the null
hypothesis.
5. Interpret the decision. Based on the decision in Step 4, we state a conclusion in the context
of the original problem.
 The average test score for an entire school is 75 with a standard deviation of 10. What is
the probability that a random sample of 5 studentd scored above 80 ?
Conditions for using t-test:
1. 𝜎 is unknown
2. 𝑛 < 30
Here 𝜇 = 75, 𝜎 = 10, 𝑛 = 5, 𝑥̅ = 80
The firstconditionisnotsatisfiedSointhisproblrmwe will use 𝑍- test.
𝑧 =
𝑥̅ − 𝜇
𝜎/√ 𝑛
=
80 − 75
10/√5
=
5
10/2.236
=
5
4.472
= 1.118
 The average test score for an entire school is 75. The standard deviation of a random
sample 40. What is the probability that a random sample of 10 studentd scored above
80 ?
Conditions for using t-test:
1. 𝜎 is unknown
2. 𝑛 < 30
Here 𝜇 = 75, 𝑆 = 40, 𝑛 = 10, 𝑥̅ = 80
The secondconditionisnotsatisfiedSointhisproblrmwe will use 𝑍- test.
𝑧 =
𝑥̅ − 𝜇
𝑆/√ 𝑛
 The average test score for an entire school is 75. The standard deviation of a random
sample of 9 students is 10. What is the probability the average test score for the sample
is above 80 ?
Conditions for using t-test:
1. 𝜎 is unknown
2. 𝑛 < 30
Here 𝜇 = 75, 𝑆 = 10, 𝑛 = 9, 𝑥̅ = 80
Here both the conditionfort-testissatisfied.Sowe will use the 𝑡 − 𝑡𝑒𝑠𝑡.
𝑡 =
𝑥̅ − 𝜇
𝑠/√ 𝑛
Example:-
The average score of all sixth graders in school District A on a math aptitude exam is 75 with a
standard deviation of 8.1. A random sample of 100 students in one school was taken. The
mean score of these 100 students was 71. Does this indicate that the students of this school are
significantly less skilled in their mathematical abilities than the average student in the district?
(Use a 5% level of significance.)
Solution:-
Here
Mean = 𝜇 = 75 , Standard deviation= 𝜎 = 8.1 , 𝑛 = 100, 𝑥̅ = 71
Conditions for using t-test:
1. 𝜎 is unknown
2. 𝑛 < 30
Since σ is known and 𝑛 > 30, we use the z-test that is based on the normal curve or normal
distribution.
Step 1:-
State the null hypothesis (contains =, ≥, or ≤) and alternate hypothesis (usually contains “not”).
Think of the statement “Does this indicate that the students of this school are significantly less
skilled in their mathematical abilities than the average student in the district?” From
“...students of this school are significantly less skilled...,” we write the alternate hypothesis
as 𝐻1: 𝜇 < 75
𝐻0: 𝜇 ≥ 75 𝐻1: 𝜇 < 75
Step 2:- Select a level of significance. Stated in the problem as 5% 𝑜𝑟 𝛼 = 0.05
Step 3:- Identify the statistical test to use. Use z-test because σ is known and the sample
(n=100) is a large sample (n > 30).
𝑧 =
𝑥̅ − 𝜇
𝜎/√ 𝑛
Recall that in the normal curve, Z=0 corresponds to the mean. Z=1, 2, 3 represent 1, 2, and 3
standard deviations above the mean; the negatives are below the mean.
Step 4:- Formulate a decision rule.
Since the alternate hypothesis states μ< 75, this is a one-tailed test to the left. For α= 0.05, we
find 𝑍 in the normal curve table that gives a probability of 0.05 to the left of Z. This means the
negative of the z value (critical value) corresponding to a table value of
0.5 − 0.05 = 0.45 𝑜𝑟 𝑍 = −1.645.
That is 𝑃(𝑍 < −1.645) = 0.05.. Because 0.4500 is exactly half way between 0.4495 and
0.4505, we get half way between 1.640 and 1.650 to get z = 1.645. Since 71 is to the left of 75,
we have 𝑧 = −1.645. That is 𝑃(𝑧 < −1.645) = 0.05.
Thus, we reject the null hypothesis if z < -1.645. And accept the alternate hypothesis that the
students in the school sampled are less skilled in math aptitude than those in district A.
Step 5:- Take a sample; arrive at a decision.
The sample of 100 students have been tested and found that their mean score was 71. Using
the statistical test (z-test) identified in Step 3 compute the test statistic by the formula from
Step 3
𝑧 =
𝑥̅− 𝜇
𝜎/√ 𝑛
=
71−75
8.1/√100
= −4.938
Since the computed 𝑧 = −4.938 < −1.645 (𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑧 𝑣𝑎𝑙𝑢𝑒), we reject the null hypothesis
that the students in the school are not less skilled in mathematical ability. Thus, we conclude
that the sixth graders in the school are less skilled in mathematical ability than the sixth graders
in District A.
The following problem is presented for students to work:
A sample of 250 married workers showed 22 missed more than 5 days last year for any reason.
A sample of 300 unmarried workers showed 35 missed more than 5 days. Use the 5% level of
significance to test and answer the question: Are unmarried workers more likely to be absent
from work than married workers?
Test of significance for Large samples:-
If the size of the sample exceeds 30 then we will test of significance for large samples.
The assumption made while dealing with problems relating to large samples are:
a) The random sampling distribution of a static is approximately normal, and
b) Values given by the samples are sufficiently close to the population value and
can be used in its place for calculating the standard error of the estimate.
Standard error of Mean:-
a) When standard deviation of the population is known
𝑆. 𝐸. 𝑋̅ =
𝜎𝑝
√ 𝑛
Where 𝑆. 𝐸. 𝑋̅ refers to the standard error of the mean.
𝜎𝑝 = Standard deviation of the population
𝑛 = Number of observations in the sample
b) When standard deviation of the population is not known , We have to use the standard
deviation of the sample in calculating standard error of mean.
The formula for calculating standard error is
𝑆. 𝐸. 𝑋̅ =
𝜎(𝑠𝑎𝑚𝑝𝑙𝑒)
√ 𝑛
Where 𝜎denote the standard deviation of the sample.
Note— If standard deviation of both the sample and the population are available then
standard deviation of the sample in calculating standard error of mean is preferred.
Example:- Calculate the standard error of mean from the following data showing the amount
paid by 100 firms in Calcutta on the occasion of Durga Puja:
Mid value (Rs.) 39 49 59 69 79 89 99
No. of firms 2 3 11 20 32 25 7
Solution:-
𝑆. 𝐸. 𝑋̅ =
𝜎
√ 𝑛
CALCULATION OF STANDARD DEVIATION
Mid value
𝑚
𝑓 (𝑚 − 69)/10
= 𝑑
𝑓𝑑 𝑓𝑑2
39 2 -3 -6 18
49 3 -2 -6 12
59 11 -1 -11 11
69 20 0 0 0
79 32 +1 +32 32
89 25 +2 +50 100
99 7 +3 +21 63
𝑁 = 100 ∑𝑓𝑑 = 80 ∑𝑓𝑑2
= 236
𝜎 = √
∑𝑓𝑑2
𝑁
− (
∑𝑓𝑑
𝑁
)
2
× 𝑖 = √
236
100
− (
80
100
)
2
× 10
= √2.36 − 0.64 × 10 = 1.311 × 10 = 13.11
𝑆. 𝐸. 𝑋̅ =
𝜎
√ 𝑛
=
13.11
√100
=
13.11
10
= 1.311
Two-tailed test for the Difference between the Means of two Samples:-
i. If two independent random samples with 𝑛1and 𝑛2 numbers (Both sample sizes are
greater than 30) respectively are drawn from the same population of standard
deviation 𝜎1 the standard error of the difference between the sample means is given
by the formula:
S.E. of the difference between sample means
= √𝜎2 (
1
𝑛1
+
1
𝑛2
)
If 𝜎 is unknown, sample standard deviation for combined sample must be substituted.
ii. If two random sample with𝑋̅1, 𝜎1, 𝑛1 and 𝑋̅2, 𝜎2, 𝑛2 respectively are drawn from the
different populations, then the S.E. of the difference between the mean is given by
the formula:
= √
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
And where 𝜎1 and 𝜎2 are unknown.
S.E. of the difference between the means
= √
𝑆1
2
𝑛1
+
𝑆2
2
𝑛2
Where 𝑆1 and 𝑆2 are represented standard deviation of the two samples.
The null hypothesis to be tested is that there is no significant difference in the means of
the two samples. i.e. ,
𝐻0: 𝜇1 = 𝜇2 ← Null hypothesis, there is no difference
𝐻 𝑎: 𝜇1 ≠ 𝜇2 ← Alternative hypothesis, a difference exists.
Example-1:-
Intelligence test on two groups of boys and girls gave the following results:
Mean S.D N
Girls 75 15 150
Boys 70 20 250
Is there a significant difference in the mean scores obtained by boys and girls ?
Solution:-
Let us take the hypothesis that there is no significant difference in the mean scored obtained by
boys and girls.
𝑆. 𝐸. (𝑋̅1 − 𝑋̅2) = √
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
𝜎1 = 15, 𝜎2 = 20, 𝑛1 = 150, 𝑛2 = 250
Substituting these values
𝑆. 𝐸. ( 𝑋̅1 − 𝑋̅2) = √
(15)2
150
+
(20)2
250
= √1.5 + 1.6 = 1.761
𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒
𝑆. 𝐸.
=
75 − 70
1.761
= 2.84
Since the difference is more than 2.58 S.E.(1% label of significance), the hypothesis is rejected.
There seems to be a significant difference in the mean scores obtained by boys and girls.
Example-2:-
A man buys 50 electric bulbs of ‘Philips’ and 50 electric bulb of ‘HMT’. He finds that ‘Philips’
bulbs give an average life of 1500 hours with a standard deviation of 60 hours and ‘HMT’ bulbs
give an average life of 1512 hours with a standard deviation of 80 hours. Is there a significant
difference in the mean of the two makes of bulbs ?
Solution:-
Let us take the hypothesis that there is no significant difference in the mean life of the two
makes of the bulbs. Calculating standard error of difference of means
𝑆. 𝐸. (𝑋̅1 − 𝑋̅2) = √
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
𝜎1 = 60, 𝜎2 = 50, 𝑛1 = 80, 𝑛2 = 50
Substituting these values
𝑆. 𝐸. ( 𝑋̅1 − 𝑋̅2) = √
(60)2
50
+
(80)2
50
= √
3600 + 6400
50
= √200 = 14.14
Observed difference between the means=1512-1500=12
𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒
𝑆. 𝐸.
=
12
14.14
= 0.849
Since the difference is less than 2.58 S.E.(1% label of significance), it could have arisen due to
fluctuation of sampling. Hence the difference in the mean of the two makes is not significant.
Test of significance for small samples:-
When the sample size is small(less than 30) the test for large sample will not work good. So
special tests are there for small samples , such as t-test and F-test.
Student t-distribution
Theoretical work on t-distribution are done by W.S. Gosset (1876-1937) In year 1900. Gosset
was employed by the Guinness & Son, a Dublin bravery, iseland, which did not permit employs
to publish research finding under their own names. So Gosset adopted the pen name
“student” and published his finding under this name. Therefore, the t-distribution is commonly
called Student t-distribution.
The t-distribution is used when the sample size is 30 or less and the population standard
deviation is unknown. The t-statistic is defined as:
𝑡 =
𝑥̅ − 𝜇
𝑆
× √ 𝑛
Where 𝑆 =
√∑( 𝑥−𝑥̅)2
𝑛−1
Test the significance of the mean of a Random Sample:-
In determining whether the mean of a sample drawn from a normal distribution deviates
significantly from a stated value (the hypothetical value of the population mean), when
variance of the population is unknown we calculate the statistic:
𝑡 =
𝑥̅ − 𝜇
𝑆
× √ 𝑛
𝑥̅ = the mean of the sample
𝜇 = the actual or hypothetical mean of the population
𝑛 = the sample size
𝑆 = the standard deviation of the sample
𝑆 = √∑( 𝑥−𝑥̅)2
𝑛−1
or 𝑆 − √∑𝑑2
−𝑛( 𝑑̅)2
𝑛−1
= √
1
𝑛−1
[∑𝑑2 −
(∑𝑑)2
𝑛
]
Where 𝑑 = deviation from the assumed mean
If the calculated value of | 𝑡| exceeds 𝑡0.05, we say that the difference between 𝑥̅ and 𝜇 is
significant at 5% label if it exceeds 𝑡0.01 , the difference is said to be significant at 1% label . If
| 𝑡| < 𝑡0.05, we conclude that the difference between 𝑥̅ and 𝜇 is not significant and hence the
sample might have been drawn from a population with mean = 𝜇 .
Fiducial limits of population Mean:-
Assuming that the sample is a random sample from a normal population of unknown mean the
95% fiducial mean of the population mean (𝜇)are:
𝑥̅ ±
𝑆
√ 𝑛
𝑡0.05
And 99% limits are
𝑥̅ ±
𝑆
√ 𝑛
𝑡0.01
Example:- The manufacture of a certain make of electric bulbs claims that his bulbs have a
mean life of 25 months with a standard deviation of 5 months. A random sample of 6 such
bulbs gave a following value. Life of months 24, 26, 30,20, 20, 18 .
Can you regard the procedure’s claimto be valid at 1% label of significance? (Given that the
table values of the appropriate test statistics at the said label are 4.032, 3.707 and 3.499 for 5,6
and 7 degree of freedom respectively.)
Solutions:- Let us take the hypothesis that there is no significant difference in the mean life of
bulbs in the sample and that of the population. Applying t-test
𝑡 =
𝑥̅ − 𝜇
𝑆
× √ 𝑛
CALCULATION OF 𝑋̅ and 𝑆
𝑥 (𝑥 − 𝑥̅) 𝑥2
24 +1 1
26 +3 9
30 +7 49
20 -3 9
20 -3 9
18 -5 25
∑𝑥 = 138 ∑𝑥2
= 102
𝑥̅ =
∑𝑥
𝑛
=
138
6
= 23
𝑆 = √
∑𝑥2
𝑛 − 1
= √
102
5
= √20.4 = 4.517
𝑡 =
𝑥̅ − 𝜇
𝑆
× √ 𝑛 =
|23 − 25|
4.517
× √6 =
2 × 2.449
4.517
= 1.084
𝑣 = 𝑛 − 1 = 6 − 1 = 5. For 𝑣 = 5 𝑡0.01 = 4.032.
The calculated value of t is less then the tabulated value. So the hypothesis is accepted. Hence
the producer’s claimis not valid at 1% label of significance.
Example:- A random sample size 16 has 53 as mean. The sum of the squares of the deviation
taken from the mean is 135. Can this sample be regarded as taken from the population having
56 as mean ? Obtain 95% and 99% confidence limit of the mean of the population. ( For v=15,
𝑡0.05 = 2.13,for v = 15, 𝑡0.01 = 2.95)
Solution—
Let us take the hypothesis that there is no significant difference between the simple mean and
hypothetical population mean. . Applying t-test
𝑡 =
𝑥̅ − 𝜇
𝑆
× √ 𝑛
𝑥̅ = 53, 𝜋 = 56, 𝑛 = 16, ∑( 𝑥 − 𝑥̅)2
= 135
𝑆 = √
∑( 𝑥 − 𝑥̅)2
𝑛 − 1
= √
135
15
= 3
𝑡 =
|53 − 56|
3
√16 =
3 × 4
3
= 4
𝑣 = 16 − 1 = 15,For 𝑣 = 15, 𝑡0.05 = 2.13
The calculated value of t is more than the tabulated value. So the hypothesis is rejected. Hence,
the sample has not come from the population having 56 as mean.
95% confidence limit of the population mean
𝑥̅ ±
𝑆
√ 𝑛
𝑡0.05 = 53 ±
3
√16
× 2.13 = 53 ±
3
4
× 2.13 = 53 ± 1.6 = 51.4 to 54.6
99% confidence limit of the population mean
𝑥̅ ±
𝑆
√ 𝑛
𝑡0.01 = 53 ±
3
√16
× 2.95 = 53 ±
3
4
× 2.95 = 53 ± 2.212 = 50.788 to 55.212
Testing difference between means of two samples (Independent Samples):-
Given two independent random samples of size 𝑛1 𝑎𝑛𝑑 𝑛2 with the means 𝑥̅1 𝑎𝑛𝑑 𝑥̅2 and the
standard deviations 𝑆1 𝑎𝑛𝑑 𝑆2 we may be interested in testing the hypothesis that the samples
Come from same normal populations. To carry out the test, we calculate the statistic as follows:
𝑡 =
𝑥̅1 − 𝑥̅2
𝑆
× √
𝑛1 𝑛2
𝑛1 + 𝑛2
Where 𝑥̅1 = mean of the first sample
𝑥̅2 = mean of the second sample
𝑛1 = number of the observations in the first sample
𝑛2 = number of the observations in the second sample
𝑆 = Combined standard deviation .
The value of 𝑆 is calculated by the following formula:
𝑆 = √
∑( 𝑥1 − 𝑥̅1)2 + ∑( 𝑥2 − 𝑥̅2)2
𝑛1 + 𝑛2 − 2
When the actual means are in fraction the deviation should be taken from the assumed
means. In such a case the combined standard deviation is obtained by applying following
formula:
𝑆 = √
∑( 𝑥1 − 𝐴1)2 + ∑( 𝑥2 − 𝐴2)2 − 𝑛1( 𝑥̅1 − 𝐴1)2 − ( 𝑥̅2 − 𝐴2)2
𝑛1 + 𝑛2 − 2
𝐴1 = Assumed mean of the first sample
𝐴2 = Assumed mean of the second sample
𝑥̅1 = Actual mean of the first sample
𝑥̅2 = Actual mean of the second sample
The degree of freedom = 𝑛1 + 𝑛2 − 2.
When we are given the number of observation and the standard deviation of the two
samples, the pooled estimate of standard deviation can be obtained as follows:
𝑆 = √
( 𝑛1 − 1) 𝑆1
2
+ ( 𝑛2 − 1) 𝑆2
2
𝑛1 + 𝑛2 − 2
The calculated value of 𝑡 be > 𝑡0.05 ( 𝑡0.01), the difference between the sample means is
said to be significant at 5%(1%)label of significance otherwise the data are said to be
consistent with the hypothesis.
Example:- Two typed of drug are used on 5 and 7 patient for reducing their weight.
Drug A was imported and drug B was indigenous. The decreases in the weight after using the
drug for six months as follows:
Drug A 10 12 13 11 14
Drug B 8 9 12 14 15 10 9
Solution:- Let us take the hypothesis that there is no significant difference in the
efficiency of the two drugs. Applying t-test
𝑡 =
𝑥̅1 − 𝑥̅2
𝑆
× √
𝑛1 𝑛2
𝑛1 + 𝑛2
𝑥1 ( 𝑥1 − 𝑥̅1) ( 𝑥1 − 𝑥̅1)2
𝑥2 ( 𝑥2 − 𝑥̅2) ( 𝑥2 − 𝑥̅2)2
10 -2 4 8 -3 9
12 0 0 9 -3 9
13 +1 1 12 +1 1
11 -1 1 14 +3 9
14 +2 4 15 +4 16
10 -1 1
9 -2 4
∑𝑥1 = 60 ∑( 𝑥1 − 𝑥̅1)2
= 10
∑𝑥2 = 77 ∑( 𝑥2 − 𝑥̅2)2
= 44
𝑥̅1 =
∑𝑥1
𝑛1
=
60
5
= 12; 𝑥̅2 =
∑𝑥2
𝑛2
=
77
7
= 11
𝑆 = √
∑( 𝑥1 − 𝑥̅1)2 + ∑( 𝑥2 − 𝑥̅2)2
𝑛1 + 𝑛2 − 2
= √
10 + 44
5 + 7 − 2
= √
54
10
= 2.324
𝑡 =
𝑥̅1 − 𝑥̅2
𝑆
× √
𝑛1 𝑛2
𝑛1 + 𝑛2
=
12 − 11
2.324
× √
5 × 7
5 + 7
=
1.708
2.324
= 0.735
𝑣 = 𝑛1 + 𝑛2 − 2 = 5 + 7 − 2 = 10
𝑣 = 10, 𝑡0.05 = 2.228
For calculated value of t is less than the table value, the hypothesis is accepted. Hence, there is
no significance in the efficacy of two drugs. Since drug B is indigenous and there is no difference
in the efficacy of imported and ingenious drugs, we should by ingenious B.
Testing Difference between Means of two sample (Dependent sample or Matched Paired
Sample):-
Two samples are said to be dependent when the elements in one sample are related to those in
the other in any significant or meaningful manner. In fact the two samples may consist of pair
of observations made on the same objects, individual or more generally, on the same selected
population elements. The t-test based on the paired observations is defined by the following
formula:
𝑡 =
𝑑̅−0
𝑆
× √ 𝑛 or 𝑡 =
𝑑̅√ 𝑛
𝑆
Where 𝑑̅ = the mean of the differences
𝑆 = the standard deviation of the differences
The value of 𝑆 is calculated as follows:
𝑆 = √∑(𝑑 − 𝑑̅)
2
𝑛 − 1
𝑜𝑟 √∑𝑑2 − 𝑛(𝑑̅)
2
𝑛 − 1
It should be noted that 𝑡 is based on 𝑛 − 1degree of freedom.
Example:-
To verify whether a course in accounting improved performance, a similar test was given to 12
participants both before and after the course. The original mark recorded in the alphabetical –
Were 44,40, 61,52,32,44,70,41,67,72,53 and 72. After the course, the marks were in the same
order 53,38,69,57,46,39,73,48,73,74,60 and 78. Was the course useful ?
Solution:-
Let us take the hypothesis that there is no significant difference in the marks obtained before
and after the course. i.e. The course has not been useful.
Applying the t- test(difference formula):
𝑡 =
𝑑̅√ 𝑛
𝑆
Participants Before
(1st Test)
After
(2nd Test)
2nd -1st Test
𝑑
𝑑2
A 44 53 +9 81
B 40 38 -2 4
C 61 69 +8 64
D 52 57 +5 25
E 32 46 +14 196
F 44 39 -5 25
G 70 73 +3 9
H 41 48 +7 49
I 67 73 +6 36
J 72 74 +2 4
K 53 60 +7 49
L 72 78 +6 36
∑𝑑 = 60 ∑𝑑2
= 578
𝑑̅ =
∑𝑑
𝑛
=
60
12
= 5
𝑆 = √∑𝑑2 − 𝑛(𝑑̅)
2
𝑛 − 1
= √
578 − 12(5)2
12 − 1
=
278
11
= 5.03
𝑡 =
𝑑̅√ 𝑛
𝑆
=
5 × √12
5.03
=
5 × 3.464
5.03
= 3.443
𝑣 = 𝑛 − 1 = 12 − 1 = 11; 𝐹𝑜𝑟 𝑣 = 11, 𝑡0.05 = 2.201
The calculated value of t is greater than the tabulated value. So the hypothesis is rejected.
Hence the course has been useful.
The F-test or the variance ratio test:-
The F-test is named in the honor of the great statistician R.A. Fisher. The object of the F-test is
to find out whether the two independent estimates of population variance differ significantly or
whether the two samples may be regarded as drawn from the normal populations having the
same variance. For carrying one out the test of significance, we calculate the ratio F. F is defined
as
𝐹 =
𝑆1
2
𝑆2
2,
Where 𝑆1
2
=
∑( 𝑥1− 𝑥̅1)2
𝑛1−1
and 𝑆2
2
=
∑( 𝑥2− 𝑥̅2)2
𝑛2−1
It should be noted that 𝑆1
2
is always the larger estimate of variance. i.e. 𝑆1
2
> 𝑆2
2
.
𝐹 =
𝐿𝑎𝑟𝑔𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑣1 = 𝑛1 − 1 and 𝑣2 = 𝑛2 − 1
𝑣1 = degrees of freedom of the sample having larger variance
𝑣2 = degrees of freedom of the sample having smaller variance
The calculated value of F is compared with the tabulated value for 𝑣1 and 𝑣2 at 5% or 1% label
of significance. If calculated value of F is greater than the tabulated value then the F ratio is
considered significant and the null hypothesis is rejected. On the other hand If calculated value
of F is less than the tabulated value then the null hypothesis is accepted and it id inferred that
the both the sample have come from the population having the same variance.
Since F test is based on the ratio of two variances, it is also called Variance Ratio Test.
Example—
Two random samples were drawn from two normal populations and their values are
A 66 67 75 76 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97
Test whether the two populations have the same variance at the 5% label of significance
(F=3.36) at 5% label for 𝑣1 = 10 and 𝑣2 = 8.
Solution—
Let us take the hypothesis that the two populations have the same variance. Applying F-test
𝐹 =
𝑆1
2
𝑆2
2
A
𝑋1
(𝑋1 − 𝑋̅1)
= 𝑥1
𝑥1
2
B
𝑋2
(𝑋2 − 𝑋̅2)
= 𝑥2
𝑥2
2
66 -14 196 64 -19 361
67 -13 169 66 -17 289
75 -5 25 74 -9 81
76 -4 16 78 -5 25
82 +2 4 82 -1 1
84 +4 16 85 +2 4
88 +8 64 87 +4 16
90 +10 100 92 +9 81
92 +12 144 93 +10 100
95 +12 144
97 +14 196
∑𝑋1 = 720 ∑𝑥1 = 0 ∑𝑥1
2
= 734 ∑𝑋2 = 913 ∑𝑥2 = 0 ∑𝑥2
2
= 1298
𝑋̅1 =
∑𝑋1
𝑛1
=
720
9
= 80; 𝑋̅2 =
∑𝑋2
𝑛2
=
913
11
= 83
𝑆1
2
=
∑( 𝑋1)2
𝑛1 − 1
=
734
9 − 1
= 91.75
𝑆2
2
=
∑( 𝑋1)2
𝑛2 − 1
=
1298
11 − 1
= 129.8
𝐹 =
𝑆1
2
𝑆2
2
=
91.75
129.8
= 0.707
For 𝑣1 = 10 and 𝑣2 = 8. 𝐹0.05 = 3.36.
The calculated value of F is less than the tabulated value. So the hypothesis is accepted. Hence
it may be calculated that the two populations have same variance.
Chi-Square Test:-
The χ2
test (pronounced Chi-Square Test) is one of the simplest and most widely used non-
parametric tests on statistical test. The symbol χ2
is the Greek later Chi . The χ2
test was first
used by Karl Pearson in the year 1900. The quantity χ2
describes the magnitudes of the
discrepancy between theory and observations. It is defined as:
χ2
= ∑
( 𝑂 − 𝐸)2
𝐸
Where 𝑂 is the observed frequencies and 𝐸 refers to the expected frequencies.
Example:- In an antimalarial complain in a certain area, quinine was administered to 812
persons out of total population of 3248. The number of fever cases is shown below
Treatment Fever No fever Total
Quinine 20 792 812
No quinine 220 2216 2436
Total 240 3008 3248
Discuss the usefulness of quinine in checking malaria.
Solution:-Let us take the hypothesis that quinine is not effective in checking malaria.
Applying χ2
test:
𝐸11 = Expectation of (AB) =
(𝐴)×(𝐵)
𝑁
=
240×812
3248
= 60
Expecting the frequency corresponding to first row and first column is 60
𝐸12 =
3008 × 812
3248
= 752
𝐸21 =
240 × 2436
3248
= 180
𝐸22 =
3008 × 2436
3248
= 2256
The table of the expected frequency shall be:
60 752 812
180 2256 2436
240 3008 3248
𝑂 𝐸 ( 𝑂 − 𝐸)2
(𝑂 − 𝐸)2
/𝐸
20 60 1600 26.667
220 180 1600 8.889
792 752 1600 2.128
2216 2256 1600 0.709
∑(𝑂 − 𝐸)2
/𝐸 = 38.393
χ2
= ∑
( 𝑂 − 𝐸)2
𝐸
= 38.393
𝑣 = ( 𝑟 − 1)( 𝑐 − 1) = (2 − 1)(2 − 1) = 1
𝑣 = 1, χ2
0.05
= 3.84
The calculated value of χ2
is greater than the tabulated value. So the hypothesis is rejected.
Hence quinine is useful in checking malaria.
Yates Correction
The Yates correction is a correction made to account for the fact that both Pearson’s chi-square
test and Mc Nemar’s chi-square test are biased upwards for a 2 x 2 contingency table. An
upwards bias gives a larger result than they should be then the Yates correction is usually
recommended, especially if the expected cell frequency is below 5.
Calculating the Yates Correction
In Yates correction, 0.5 is subtracted from the numerical difference between the observed
frequencies and expected frequencies. It is just the Chi2 formula with the .5 subtraction:
𝜒2
𝑌𝑎𝑡𝑒𝑠
= ∑
(| 𝑂 − 𝐸| − 0.5)2
𝐸
Arguments for why the Yates Correction should not be used
Although some people recommend that you should use the correction only if your expected cell
frequency is below 5, others recommend that you don’t use it at all. A large body of research
has found that the correction is too strict. Several researchers, including Yates, have used
known statistical data to test whether the correction works. If we are using a statistical program
like SPSS to calculate the critical chi-square value for a contingency table, the program will
usually force you to incorporate the correction. However, knowing that the correction may be
too strict allows you to make a judgment call on your data.

Contenu connexe

Tendances

One sided or one-tailed tests
One sided or one-tailed testsOne sided or one-tailed tests
One sided or one-tailed tests
Hasnain Baber
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
Jags Jagdish
 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testing
jundumaug1
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
saba khan
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
Jags Jagdish
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
Elkana Rorio
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theorem
Vijeesh Soman
 

Tendances (20)

Properties of estimators (blue)
Properties of estimators (blue)Properties of estimators (blue)
Properties of estimators (blue)
 
One sided or one-tailed tests
One sided or one-tailed testsOne sided or one-tailed tests
One sided or one-tailed tests
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
STATISTICS: Hypothesis Testing
STATISTICS: Hypothesis TestingSTATISTICS: Hypothesis Testing
STATISTICS: Hypothesis Testing
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statistics
 
Covariance vs Correlation
Covariance vs CorrelationCovariance vs Correlation
Covariance vs Correlation
 
Probability & application in business
Probability & application in businessProbability & application in business
Probability & application in business
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
 
Hypothesis testing Part1
Hypothesis testing Part1Hypothesis testing Part1
Hypothesis testing Part1
 
Basics of Hypothesis Testing
Basics of Hypothesis Testing  Basics of Hypothesis Testing
Basics of Hypothesis Testing
 
Hypothesis testing , T test , chi square test, z test
Hypothesis testing , T test , chi square test, z test Hypothesis testing , T test , chi square test, z test
Hypothesis testing , T test , chi square test, z test
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Regression vs correlation and causation
Regression vs correlation and causationRegression vs correlation and causation
Regression vs correlation and causation
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theorem
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Testing of hypothesis - large sample test
Testing of hypothesis - large sample testTesting of hypothesis - large sample test
Testing of hypothesis - large sample test
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 

En vedette (6)

Sampling distribution concepts
Sampling distribution conceptsSampling distribution concepts
Sampling distribution concepts
 
STATISTICS: Normal Distribution
STATISTICS: Normal Distribution STATISTICS: Normal Distribution
STATISTICS: Normal Distribution
 
Normal distribution and sampling distribution
Normal distribution and sampling distributionNormal distribution and sampling distribution
Normal distribution and sampling distribution
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Student's T-Test
Student's T-TestStudent's T-Test
Student's T-Test
 
Statistics Project
Statistics ProjectStatistics Project
Statistics Project
 

Similaire à Unit 4 Tests of Significance

testingofhypothesis-150108150611-conversion-gate02.pptx
testingofhypothesis-150108150611-conversion-gate02.pptxtestingofhypothesis-150108150611-conversion-gate02.pptx
testingofhypothesis-150108150611-conversion-gate02.pptx
wigatox256
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Nirajan Bam
 
Hypothesis Testing Definitions A statistical hypothesi.docx
Hypothesis Testing  Definitions A statistical hypothesi.docxHypothesis Testing  Definitions A statistical hypothesi.docx
Hypothesis Testing Definitions A statistical hypothesi.docx
wilcockiris
 
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesisTesting of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
svmmcradonco1
 
Day-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptxDay-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptx
rjaisankar
 

Similaire à Unit 4 Tests of Significance (20)

20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd20200519073328de6dca404c.pdfkshhjejhehdhd
20200519073328de6dca404c.pdfkshhjejhehdhd
 
Tests of significance
Tests of significanceTests of significance
Tests of significance
 
testingofhypothesis-150108150611-conversion-gate02.pptx
testingofhypothesis-150108150611-conversion-gate02.pptxtestingofhypothesis-150108150611-conversion-gate02.pptx
testingofhypothesis-150108150611-conversion-gate02.pptx
 
Basic of Statistical Inference Part-IV: An Overview of Hypothesis Testing
Basic of Statistical Inference Part-IV: An Overview of Hypothesis TestingBasic of Statistical Inference Part-IV: An Overview of Hypothesis Testing
Basic of Statistical Inference Part-IV: An Overview of Hypothesis Testing
 
hypothesis-tesing.pdf
hypothesis-tesing.pdfhypothesis-tesing.pdf
hypothesis-tesing.pdf
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
hypothesis test
 hypothesis test hypothesis test
hypothesis test
 
Testing of Hypothesis.pptx
Testing of Hypothesis.pptxTesting of Hypothesis.pptx
Testing of Hypothesis.pptx
 
Formulatinghypotheses
Formulatinghypotheses Formulatinghypotheses
Formulatinghypotheses
 
Formulating Hypotheses
Formulating Hypotheses Formulating Hypotheses
Formulating Hypotheses
 
Basics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for PharmacyBasics of Hypothesis testing for Pharmacy
Basics of Hypothesis testing for Pharmacy
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
312320.pptx
312320.pptx312320.pptx
312320.pptx
 
Testing of Hypothesis combined with tests.pdf
Testing of Hypothesis combined with tests.pdfTesting of Hypothesis combined with tests.pdf
Testing of Hypothesis combined with tests.pdf
 
Spss session 1 and 2
Spss session 1 and 2Spss session 1 and 2
Spss session 1 and 2
 
Hypothesis Testing Definitions A statistical hypothesi.docx
Hypothesis Testing  Definitions A statistical hypothesi.docxHypothesis Testing  Definitions A statistical hypothesi.docx
Hypothesis Testing Definitions A statistical hypothesi.docx
 
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesisTesting of Hypothesis, p-value, Gaussian distribution, null hypothesis
Testing of Hypothesis, p-value, Gaussian distribution, null hypothesis
 
Day-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptxDay-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptx
 
Hypothesis
HypothesisHypothesis
Hypothesis
 

Plus de Rai University

Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Rai University
 

Plus de Rai University (20)

Brochure Rai University
Brochure Rai University Brochure Rai University
Brochure Rai University
 
Mm unit 4point2
Mm unit 4point2Mm unit 4point2
Mm unit 4point2
 
Mm unit 4point1
Mm unit 4point1Mm unit 4point1
Mm unit 4point1
 
Mm unit 4point3
Mm unit 4point3Mm unit 4point3
Mm unit 4point3
 
Mm unit 3point2
Mm unit 3point2Mm unit 3point2
Mm unit 3point2
 
Mm unit 3point1
Mm unit 3point1Mm unit 3point1
Mm unit 3point1
 
Mm unit 2point2
Mm unit 2point2Mm unit 2point2
Mm unit 2point2
 
Mm unit 2 point 1
Mm unit 2 point 1Mm unit 2 point 1
Mm unit 2 point 1
 
Mm unit 1point3
Mm unit 1point3Mm unit 1point3
Mm unit 1point3
 
Mm unit 1point2
Mm unit 1point2Mm unit 1point2
Mm unit 1point2
 
Mm unit 1point1
Mm unit 1point1Mm unit 1point1
Mm unit 1point1
 
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,Bdft ii, tmt, unit-iii,  dyeing & types of dyeing,
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
 
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02Bsc agri  2 pae  u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
 
Bsc agri 2 pae u-4.3 public expenditure
Bsc agri  2 pae  u-4.3 public expenditureBsc agri  2 pae  u-4.3 public expenditure
Bsc agri 2 pae u-4.3 public expenditure
 
Bsc agri 2 pae u-4.2 public finance
Bsc agri  2 pae  u-4.2 public financeBsc agri  2 pae  u-4.2 public finance
Bsc agri 2 pae u-4.2 public finance
 
Bsc agri 2 pae u-4.1 introduction
Bsc agri  2 pae  u-4.1 introductionBsc agri  2 pae  u-4.1 introduction
Bsc agri 2 pae u-4.1 introduction
 
Bsc agri 2 pae u-3.3 inflation
Bsc agri  2 pae  u-3.3  inflationBsc agri  2 pae  u-3.3  inflation
Bsc agri 2 pae u-3.3 inflation
 
Bsc agri 2 pae u-3.2 introduction to macro economics
Bsc agri  2 pae  u-3.2 introduction to macro economicsBsc agri  2 pae  u-3.2 introduction to macro economics
Bsc agri 2 pae u-3.2 introduction to macro economics
 
Bsc agri 2 pae u-3.1 marketstructure
Bsc agri  2 pae  u-3.1 marketstructureBsc agri  2 pae  u-3.1 marketstructure
Bsc agri 2 pae u-3.1 marketstructure
 
Bsc agri 2 pae u-3 perfect-competition
Bsc agri  2 pae  u-3 perfect-competitionBsc agri  2 pae  u-3 perfect-competition
Bsc agri 2 pae u-3 perfect-competition
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Dernier (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

Unit 4 Tests of Significance

  • 1. Unit-4 Tests of Significance Once sample data has been gathered through an observational study or experiment, statistical inference allows analysts to assess evidence in favor or some claimabout the population from which the sample has been drawn. The methods of inference used to support or reject claims based on sample data are known as tests of significance. Every test of significance begins with a null hypothesis H0. H0 represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is not better on average, than the current drug. We would write H0: there is no difference between the two drugs on average. The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to establish. For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new drug has a different effect, on average, compared to that of the current drug. We would write Ha: the two drugs have different effects, on average. The alternative hypothesis might also be that the new drug is better, on average, than the current drug. In this case we would write Ha: the new drug is better than the current drug, on average. The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either "reject H0 in favor of Ha" or "do not reject H0"; we never conclude "reject Ha", or even "accept Ha". If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favor of Ha; rejecting the null hypothesis then, suggests that the alternative hypothesis may be true. (Definitions taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1) Hypotheses are always stated in terms of population parameter, such as the mean 𝜇. An alternative hypothesis may be one-sided or two-sided. A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis. A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis -- the direction does not matter. Hypotheses for a one-sided test for a population mean take the following form: H0: 𝜇 = k Ha: 𝜇 > k or H0: 𝜇 = k Ha: 𝜇 < k.
  • 2. Hypotheses for a two-sided test for a population mean take the following form: H0: 𝜇 = k Ha: 𝜇 k. A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics Glossary v1.1) Example Suppose a test has been given to all high school students in a certain state. The mean test score for the entire state is 70, with standard deviation equal to 10. Members of the school board suspect that female students have a higher mean score on the test than male students, because the mean score 𝑥̅ from a random sample of 64 female students is equal to 73. Does this provide strong evidence that the overall mean for female students is higher? The null hypothesis H0 claims that there is no difference between the mean score for female students and the mean for the entire population, so that 𝜇 = 70. The alternative hypothesis claims that the mean for female students is higher than the entire student populations mean, so that 𝜇 > 70. Types of errors:- There are two types of error in testing of hypothesis. When a statistical hypothesis is tested there are four types of possibilities arise 1. The hypothesis is true but our test rejects it. (Type- I error) 2. The hypothesis is false but our test accepts it. (Type-II error) 3. The hypothesis is true but our test accepts it. (Correct decision) 4. The hypothesis is false but our test rejects it. (Correct decision) The first two possibility leads to errors. In a statistical hypothesis testing experiment, a type-I error is committed by rejecting the null hypothesis when it is true. The probability of committing a type-I error is denoted by 𝛼 (pronounced alpha), where 𝛼 = Prob. (Type- I error) = Prob. (Rejecting 𝐻0/𝐻 𝑎 is true) On the other head, a Type-II error is committed by not rejecting (i.e. accepting) the null hypothesis when it is false. The probability of committing a type-II error is denoted by 𝛽 (pounced as beta), where
  • 3. 𝛽= Probability (Type-II error) = Probability (Not rejecting or accepting 𝐻0/𝐻 𝑎 false) The distinction between these two types of error can be made by an example. Assume that the difference between the two population mean is actually zero. If our test of significance when applied to the simple mean is significant, we make an Type- I error. On the other hand, suppose there is true difference between the two population means. Now our test of significance leads to the judgment “not significant”, we commit Type- II error, we thus find ourselves in the situation which is described by the following table: Hypothesis test As we know sometimes we cannot survey or test all persons or objects; therefore, we have to take a sample. From the results of analysis from the sample data, we can predict the results from the population. Some questions that one may want to answer are 1. Are unmarried workers more likely to be absent from work than married workers? 2. In Fall 1996, did students in Math 163-01 score the same on the exam as students in Math 163-02? 3. Is there any difference between the strengths of steel wire produced by the XY Company and Bob’s Wire Company? 4. A hospital spokesperson claims that the average daily room charge for a specific procedure is $622. Can we reject this claim? Hypothesis testing is a procedure, based on sample evidence and probability theory, used to determine whether the hypothesis is a reasonable statement and should not be rejected, or is unreasonable and should be rejected. Hypothesis test:- A statistical hypothesis test is a method of statistical inference used for testing a statistical hypothesis. A test result is called statistically significant if it has been predicted as unlikely to have occurred by chance alone, according to a threshold probability— the significance level. Steps in the hypothesis testing procedure 1. State the null hypothesis and the alternate hypothesis.
  • 4. Null Hypothesis – statement about the value of a population parameter. Alternate Hypothesis – statement that is accepted if evidence proves the null hypothesis to be false. 2. Select the appropriate test statistic and level of significance. When testing a hypothesis of a proportion, we use the z-statistic or z-test and the formula 𝑧 = 𝑝̂ − 𝑝 √ 𝑝𝑞 𝑛 When testing a hypothesis of a mean, we use the z-statistic or we use the t-statistic according to the following conditions. If the population standard deviation, σ, is known and either the data is normally distributed or the sample size n > 30, we use the normal distribution (z-statistic). When the population standard deviation, σ, is unknown and either the data is normally distributed or the sample size is greater than 30 (n > 30), we use the t-distribution (t-statistic). A traditional guideline for choosing the level of significance is as follows: (a) the 0.10 level for political polling, (b) the 0.05 level for consumer research projects, and (c) the 0.01 level for quality assurance work. 3. State the decision rules. The decision rules state the conditions under which the null hypothesis will be accepted or rejected. The critical value for the test-statistic is determined by the level of significance. The critical value is the value that divides the non-reject region from the reject region. 4. Compute the appropriate test statistic and make the decision. When we use the z-statistic, we use the formula 𝑧 = 𝑥̅ − 𝜇 𝜎/√ 𝑛 When we use the t-statistic, we use the formula 𝑡 = 𝑥̅ − 𝜇 𝑠/√ 𝑛 Compare the computed test statistic with critical value. If the computed value is within the rejection region(s), we reject the null hypothesis; otherwise, we do not reject the null hypothesis. 5. Interpret the decision. Based on the decision in Step 4, we state a conclusion in the context of the original problem.
  • 5.  The average test score for an entire school is 75 with a standard deviation of 10. What is the probability that a random sample of 5 studentd scored above 80 ? Conditions for using t-test: 1. 𝜎 is unknown 2. 𝑛 < 30 Here 𝜇 = 75, 𝜎 = 10, 𝑛 = 5, 𝑥̅ = 80 The firstconditionisnotsatisfiedSointhisproblrmwe will use 𝑍- test. 𝑧 = 𝑥̅ − 𝜇 𝜎/√ 𝑛 = 80 − 75 10/√5 = 5 10/2.236 = 5 4.472 = 1.118  The average test score for an entire school is 75. The standard deviation of a random sample 40. What is the probability that a random sample of 10 studentd scored above 80 ? Conditions for using t-test: 1. 𝜎 is unknown 2. 𝑛 < 30 Here 𝜇 = 75, 𝑆 = 40, 𝑛 = 10, 𝑥̅ = 80 The secondconditionisnotsatisfiedSointhisproblrmwe will use 𝑍- test. 𝑧 = 𝑥̅ − 𝜇 𝑆/√ 𝑛  The average test score for an entire school is 75. The standard deviation of a random sample of 9 students is 10. What is the probability the average test score for the sample is above 80 ? Conditions for using t-test: 1. 𝜎 is unknown 2. 𝑛 < 30 Here 𝜇 = 75, 𝑆 = 10, 𝑛 = 9, 𝑥̅ = 80 Here both the conditionfort-testissatisfied.Sowe will use the 𝑡 − 𝑡𝑒𝑠𝑡. 𝑡 = 𝑥̅ − 𝜇 𝑠/√ 𝑛
  • 6. Example:- The average score of all sixth graders in school District A on a math aptitude exam is 75 with a standard deviation of 8.1. A random sample of 100 students in one school was taken. The mean score of these 100 students was 71. Does this indicate that the students of this school are significantly less skilled in their mathematical abilities than the average student in the district? (Use a 5% level of significance.) Solution:- Here Mean = 𝜇 = 75 , Standard deviation= 𝜎 = 8.1 , 𝑛 = 100, 𝑥̅ = 71 Conditions for using t-test: 1. 𝜎 is unknown 2. 𝑛 < 30 Since σ is known and 𝑛 > 30, we use the z-test that is based on the normal curve or normal distribution. Step 1:- State the null hypothesis (contains =, ≥, or ≤) and alternate hypothesis (usually contains “not”). Think of the statement “Does this indicate that the students of this school are significantly less skilled in their mathematical abilities than the average student in the district?” From “...students of this school are significantly less skilled...,” we write the alternate hypothesis as 𝐻1: 𝜇 < 75 𝐻0: 𝜇 ≥ 75 𝐻1: 𝜇 < 75 Step 2:- Select a level of significance. Stated in the problem as 5% 𝑜𝑟 𝛼 = 0.05 Step 3:- Identify the statistical test to use. Use z-test because σ is known and the sample (n=100) is a large sample (n > 30). 𝑧 = 𝑥̅ − 𝜇 𝜎/√ 𝑛 Recall that in the normal curve, Z=0 corresponds to the mean. Z=1, 2, 3 represent 1, 2, and 3 standard deviations above the mean; the negatives are below the mean.
  • 7. Step 4:- Formulate a decision rule. Since the alternate hypothesis states μ< 75, this is a one-tailed test to the left. For α= 0.05, we find 𝑍 in the normal curve table that gives a probability of 0.05 to the left of Z. This means the negative of the z value (critical value) corresponding to a table value of 0.5 − 0.05 = 0.45 𝑜𝑟 𝑍 = −1.645. That is 𝑃(𝑍 < −1.645) = 0.05.. Because 0.4500 is exactly half way between 0.4495 and 0.4505, we get half way between 1.640 and 1.650 to get z = 1.645. Since 71 is to the left of 75, we have 𝑧 = −1.645. That is 𝑃(𝑧 < −1.645) = 0.05. Thus, we reject the null hypothesis if z < -1.645. And accept the alternate hypothesis that the students in the school sampled are less skilled in math aptitude than those in district A. Step 5:- Take a sample; arrive at a decision. The sample of 100 students have been tested and found that their mean score was 71. Using the statistical test (z-test) identified in Step 3 compute the test statistic by the formula from Step 3 𝑧 = 𝑥̅− 𝜇 𝜎/√ 𝑛 = 71−75 8.1/√100 = −4.938 Since the computed 𝑧 = −4.938 < −1.645 (𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑧 𝑣𝑎𝑙𝑢𝑒), we reject the null hypothesis that the students in the school are not less skilled in mathematical ability. Thus, we conclude that the sixth graders in the school are less skilled in mathematical ability than the sixth graders in District A. The following problem is presented for students to work:
  • 8. A sample of 250 married workers showed 22 missed more than 5 days last year for any reason. A sample of 300 unmarried workers showed 35 missed more than 5 days. Use the 5% level of significance to test and answer the question: Are unmarried workers more likely to be absent from work than married workers? Test of significance for Large samples:- If the size of the sample exceeds 30 then we will test of significance for large samples. The assumption made while dealing with problems relating to large samples are: a) The random sampling distribution of a static is approximately normal, and b) Values given by the samples are sufficiently close to the population value and can be used in its place for calculating the standard error of the estimate. Standard error of Mean:- a) When standard deviation of the population is known 𝑆. 𝐸. 𝑋̅ = 𝜎𝑝 √ 𝑛 Where 𝑆. 𝐸. 𝑋̅ refers to the standard error of the mean. 𝜎𝑝 = Standard deviation of the population 𝑛 = Number of observations in the sample b) When standard deviation of the population is not known , We have to use the standard deviation of the sample in calculating standard error of mean. The formula for calculating standard error is 𝑆. 𝐸. 𝑋̅ = 𝜎(𝑠𝑎𝑚𝑝𝑙𝑒) √ 𝑛 Where 𝜎denote the standard deviation of the sample. Note— If standard deviation of both the sample and the population are available then standard deviation of the sample in calculating standard error of mean is preferred. Example:- Calculate the standard error of mean from the following data showing the amount paid by 100 firms in Calcutta on the occasion of Durga Puja:
  • 9. Mid value (Rs.) 39 49 59 69 79 89 99 No. of firms 2 3 11 20 32 25 7 Solution:- 𝑆. 𝐸. 𝑋̅ = 𝜎 √ 𝑛 CALCULATION OF STANDARD DEVIATION Mid value 𝑚 𝑓 (𝑚 − 69)/10 = 𝑑 𝑓𝑑 𝑓𝑑2 39 2 -3 -6 18 49 3 -2 -6 12 59 11 -1 -11 11 69 20 0 0 0 79 32 +1 +32 32 89 25 +2 +50 100 99 7 +3 +21 63 𝑁 = 100 ∑𝑓𝑑 = 80 ∑𝑓𝑑2 = 236 𝜎 = √ ∑𝑓𝑑2 𝑁 − ( ∑𝑓𝑑 𝑁 ) 2 × 𝑖 = √ 236 100 − ( 80 100 ) 2 × 10 = √2.36 − 0.64 × 10 = 1.311 × 10 = 13.11 𝑆. 𝐸. 𝑋̅ = 𝜎 √ 𝑛 = 13.11 √100 = 13.11 10 = 1.311 Two-tailed test for the Difference between the Means of two Samples:- i. If two independent random samples with 𝑛1and 𝑛2 numbers (Both sample sizes are greater than 30) respectively are drawn from the same population of standard deviation 𝜎1 the standard error of the difference between the sample means is given by the formula: S.E. of the difference between sample means = √𝜎2 ( 1 𝑛1 + 1 𝑛2 ) If 𝜎 is unknown, sample standard deviation for combined sample must be substituted.
  • 10. ii. If two random sample with𝑋̅1, 𝜎1, 𝑛1 and 𝑋̅2, 𝜎2, 𝑛2 respectively are drawn from the different populations, then the S.E. of the difference between the mean is given by the formula: = √ 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 And where 𝜎1 and 𝜎2 are unknown. S.E. of the difference between the means = √ 𝑆1 2 𝑛1 + 𝑆2 2 𝑛2 Where 𝑆1 and 𝑆2 are represented standard deviation of the two samples. The null hypothesis to be tested is that there is no significant difference in the means of the two samples. i.e. , 𝐻0: 𝜇1 = 𝜇2 ← Null hypothesis, there is no difference 𝐻 𝑎: 𝜇1 ≠ 𝜇2 ← Alternative hypothesis, a difference exists. Example-1:- Intelligence test on two groups of boys and girls gave the following results: Mean S.D N Girls 75 15 150 Boys 70 20 250 Is there a significant difference in the mean scores obtained by boys and girls ? Solution:- Let us take the hypothesis that there is no significant difference in the mean scored obtained by boys and girls. 𝑆. 𝐸. (𝑋̅1 − 𝑋̅2) = √ 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 𝜎1 = 15, 𝜎2 = 20, 𝑛1 = 150, 𝑛2 = 250 Substituting these values
  • 11. 𝑆. 𝐸. ( 𝑋̅1 − 𝑋̅2) = √ (15)2 150 + (20)2 250 = √1.5 + 1.6 = 1.761 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑆. 𝐸. = 75 − 70 1.761 = 2.84 Since the difference is more than 2.58 S.E.(1% label of significance), the hypothesis is rejected. There seems to be a significant difference in the mean scores obtained by boys and girls. Example-2:- A man buys 50 electric bulbs of ‘Philips’ and 50 electric bulb of ‘HMT’. He finds that ‘Philips’ bulbs give an average life of 1500 hours with a standard deviation of 60 hours and ‘HMT’ bulbs give an average life of 1512 hours with a standard deviation of 80 hours. Is there a significant difference in the mean of the two makes of bulbs ? Solution:- Let us take the hypothesis that there is no significant difference in the mean life of the two makes of the bulbs. Calculating standard error of difference of means 𝑆. 𝐸. (𝑋̅1 − 𝑋̅2) = √ 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 𝜎1 = 60, 𝜎2 = 50, 𝑛1 = 80, 𝑛2 = 50 Substituting these values 𝑆. 𝐸. ( 𝑋̅1 − 𝑋̅2) = √ (60)2 50 + (80)2 50 = √ 3600 + 6400 50 = √200 = 14.14 Observed difference between the means=1512-1500=12 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑆. 𝐸. = 12 14.14 = 0.849 Since the difference is less than 2.58 S.E.(1% label of significance), it could have arisen due to fluctuation of sampling. Hence the difference in the mean of the two makes is not significant. Test of significance for small samples:-
  • 12. When the sample size is small(less than 30) the test for large sample will not work good. So special tests are there for small samples , such as t-test and F-test. Student t-distribution Theoretical work on t-distribution are done by W.S. Gosset (1876-1937) In year 1900. Gosset was employed by the Guinness & Son, a Dublin bravery, iseland, which did not permit employs to publish research finding under their own names. So Gosset adopted the pen name “student” and published his finding under this name. Therefore, the t-distribution is commonly called Student t-distribution. The t-distribution is used when the sample size is 30 or less and the population standard deviation is unknown. The t-statistic is defined as: 𝑡 = 𝑥̅ − 𝜇 𝑆 × √ 𝑛 Where 𝑆 = √∑( 𝑥−𝑥̅)2 𝑛−1 Test the significance of the mean of a Random Sample:- In determining whether the mean of a sample drawn from a normal distribution deviates significantly from a stated value (the hypothetical value of the population mean), when variance of the population is unknown we calculate the statistic: 𝑡 = 𝑥̅ − 𝜇 𝑆 × √ 𝑛 𝑥̅ = the mean of the sample 𝜇 = the actual or hypothetical mean of the population 𝑛 = the sample size 𝑆 = the standard deviation of the sample 𝑆 = √∑( 𝑥−𝑥̅)2 𝑛−1 or 𝑆 − √∑𝑑2 −𝑛( 𝑑̅)2 𝑛−1 = √ 1 𝑛−1 [∑𝑑2 − (∑𝑑)2 𝑛 ] Where 𝑑 = deviation from the assumed mean If the calculated value of | 𝑡| exceeds 𝑡0.05, we say that the difference between 𝑥̅ and 𝜇 is significant at 5% label if it exceeds 𝑡0.01 , the difference is said to be significant at 1% label . If
  • 13. | 𝑡| < 𝑡0.05, we conclude that the difference between 𝑥̅ and 𝜇 is not significant and hence the sample might have been drawn from a population with mean = 𝜇 . Fiducial limits of population Mean:- Assuming that the sample is a random sample from a normal population of unknown mean the 95% fiducial mean of the population mean (𝜇)are: 𝑥̅ ± 𝑆 √ 𝑛 𝑡0.05 And 99% limits are 𝑥̅ ± 𝑆 √ 𝑛 𝑡0.01 Example:- The manufacture of a certain make of electric bulbs claims that his bulbs have a mean life of 25 months with a standard deviation of 5 months. A random sample of 6 such bulbs gave a following value. Life of months 24, 26, 30,20, 20, 18 . Can you regard the procedure’s claimto be valid at 1% label of significance? (Given that the table values of the appropriate test statistics at the said label are 4.032, 3.707 and 3.499 for 5,6 and 7 degree of freedom respectively.) Solutions:- Let us take the hypothesis that there is no significant difference in the mean life of bulbs in the sample and that of the population. Applying t-test 𝑡 = 𝑥̅ − 𝜇 𝑆 × √ 𝑛 CALCULATION OF 𝑋̅ and 𝑆 𝑥 (𝑥 − 𝑥̅) 𝑥2 24 +1 1 26 +3 9 30 +7 49 20 -3 9 20 -3 9 18 -5 25 ∑𝑥 = 138 ∑𝑥2 = 102 𝑥̅ = ∑𝑥 𝑛 = 138 6 = 23
  • 14. 𝑆 = √ ∑𝑥2 𝑛 − 1 = √ 102 5 = √20.4 = 4.517 𝑡 = 𝑥̅ − 𝜇 𝑆 × √ 𝑛 = |23 − 25| 4.517 × √6 = 2 × 2.449 4.517 = 1.084 𝑣 = 𝑛 − 1 = 6 − 1 = 5. For 𝑣 = 5 𝑡0.01 = 4.032. The calculated value of t is less then the tabulated value. So the hypothesis is accepted. Hence the producer’s claimis not valid at 1% label of significance. Example:- A random sample size 16 has 53 as mean. The sum of the squares of the deviation taken from the mean is 135. Can this sample be regarded as taken from the population having 56 as mean ? Obtain 95% and 99% confidence limit of the mean of the population. ( For v=15, 𝑡0.05 = 2.13,for v = 15, 𝑡0.01 = 2.95) Solution— Let us take the hypothesis that there is no significant difference between the simple mean and hypothetical population mean. . Applying t-test 𝑡 = 𝑥̅ − 𝜇 𝑆 × √ 𝑛 𝑥̅ = 53, 𝜋 = 56, 𝑛 = 16, ∑( 𝑥 − 𝑥̅)2 = 135 𝑆 = √ ∑( 𝑥 − 𝑥̅)2 𝑛 − 1 = √ 135 15 = 3 𝑡 = |53 − 56| 3 √16 = 3 × 4 3 = 4 𝑣 = 16 − 1 = 15,For 𝑣 = 15, 𝑡0.05 = 2.13 The calculated value of t is more than the tabulated value. So the hypothesis is rejected. Hence, the sample has not come from the population having 56 as mean. 95% confidence limit of the population mean 𝑥̅ ± 𝑆 √ 𝑛 𝑡0.05 = 53 ± 3 √16 × 2.13 = 53 ± 3 4 × 2.13 = 53 ± 1.6 = 51.4 to 54.6 99% confidence limit of the population mean 𝑥̅ ± 𝑆 √ 𝑛 𝑡0.01 = 53 ± 3 √16 × 2.95 = 53 ± 3 4 × 2.95 = 53 ± 2.212 = 50.788 to 55.212 Testing difference between means of two samples (Independent Samples):- Given two independent random samples of size 𝑛1 𝑎𝑛𝑑 𝑛2 with the means 𝑥̅1 𝑎𝑛𝑑 𝑥̅2 and the standard deviations 𝑆1 𝑎𝑛𝑑 𝑆2 we may be interested in testing the hypothesis that the samples
  • 15. Come from same normal populations. To carry out the test, we calculate the statistic as follows: 𝑡 = 𝑥̅1 − 𝑥̅2 𝑆 × √ 𝑛1 𝑛2 𝑛1 + 𝑛2 Where 𝑥̅1 = mean of the first sample 𝑥̅2 = mean of the second sample 𝑛1 = number of the observations in the first sample 𝑛2 = number of the observations in the second sample 𝑆 = Combined standard deviation . The value of 𝑆 is calculated by the following formula: 𝑆 = √ ∑( 𝑥1 − 𝑥̅1)2 + ∑( 𝑥2 − 𝑥̅2)2 𝑛1 + 𝑛2 − 2 When the actual means are in fraction the deviation should be taken from the assumed means. In such a case the combined standard deviation is obtained by applying following formula: 𝑆 = √ ∑( 𝑥1 − 𝐴1)2 + ∑( 𝑥2 − 𝐴2)2 − 𝑛1( 𝑥̅1 − 𝐴1)2 − ( 𝑥̅2 − 𝐴2)2 𝑛1 + 𝑛2 − 2 𝐴1 = Assumed mean of the first sample 𝐴2 = Assumed mean of the second sample 𝑥̅1 = Actual mean of the first sample 𝑥̅2 = Actual mean of the second sample The degree of freedom = 𝑛1 + 𝑛2 − 2. When we are given the number of observation and the standard deviation of the two samples, the pooled estimate of standard deviation can be obtained as follows: 𝑆 = √ ( 𝑛1 − 1) 𝑆1 2 + ( 𝑛2 − 1) 𝑆2 2 𝑛1 + 𝑛2 − 2
  • 16. The calculated value of 𝑡 be > 𝑡0.05 ( 𝑡0.01), the difference between the sample means is said to be significant at 5%(1%)label of significance otherwise the data are said to be consistent with the hypothesis. Example:- Two typed of drug are used on 5 and 7 patient for reducing their weight. Drug A was imported and drug B was indigenous. The decreases in the weight after using the drug for six months as follows: Drug A 10 12 13 11 14 Drug B 8 9 12 14 15 10 9 Solution:- Let us take the hypothesis that there is no significant difference in the efficiency of the two drugs. Applying t-test 𝑡 = 𝑥̅1 − 𝑥̅2 𝑆 × √ 𝑛1 𝑛2 𝑛1 + 𝑛2 𝑥1 ( 𝑥1 − 𝑥̅1) ( 𝑥1 − 𝑥̅1)2 𝑥2 ( 𝑥2 − 𝑥̅2) ( 𝑥2 − 𝑥̅2)2 10 -2 4 8 -3 9 12 0 0 9 -3 9 13 +1 1 12 +1 1 11 -1 1 14 +3 9 14 +2 4 15 +4 16 10 -1 1 9 -2 4 ∑𝑥1 = 60 ∑( 𝑥1 − 𝑥̅1)2 = 10 ∑𝑥2 = 77 ∑( 𝑥2 − 𝑥̅2)2 = 44 𝑥̅1 = ∑𝑥1 𝑛1 = 60 5 = 12; 𝑥̅2 = ∑𝑥2 𝑛2 = 77 7 = 11 𝑆 = √ ∑( 𝑥1 − 𝑥̅1)2 + ∑( 𝑥2 − 𝑥̅2)2 𝑛1 + 𝑛2 − 2 = √ 10 + 44 5 + 7 − 2 = √ 54 10 = 2.324 𝑡 = 𝑥̅1 − 𝑥̅2 𝑆 × √ 𝑛1 𝑛2 𝑛1 + 𝑛2
  • 17. = 12 − 11 2.324 × √ 5 × 7 5 + 7 = 1.708 2.324 = 0.735 𝑣 = 𝑛1 + 𝑛2 − 2 = 5 + 7 − 2 = 10 𝑣 = 10, 𝑡0.05 = 2.228 For calculated value of t is less than the table value, the hypothesis is accepted. Hence, there is no significance in the efficacy of two drugs. Since drug B is indigenous and there is no difference in the efficacy of imported and ingenious drugs, we should by ingenious B. Testing Difference between Means of two sample (Dependent sample or Matched Paired Sample):- Two samples are said to be dependent when the elements in one sample are related to those in the other in any significant or meaningful manner. In fact the two samples may consist of pair of observations made on the same objects, individual or more generally, on the same selected population elements. The t-test based on the paired observations is defined by the following formula: 𝑡 = 𝑑̅−0 𝑆 × √ 𝑛 or 𝑡 = 𝑑̅√ 𝑛 𝑆 Where 𝑑̅ = the mean of the differences 𝑆 = the standard deviation of the differences The value of 𝑆 is calculated as follows: 𝑆 = √∑(𝑑 − 𝑑̅) 2 𝑛 − 1 𝑜𝑟 √∑𝑑2 − 𝑛(𝑑̅) 2 𝑛 − 1 It should be noted that 𝑡 is based on 𝑛 − 1degree of freedom. Example:- To verify whether a course in accounting improved performance, a similar test was given to 12 participants both before and after the course. The original mark recorded in the alphabetical – Were 44,40, 61,52,32,44,70,41,67,72,53 and 72. After the course, the marks were in the same order 53,38,69,57,46,39,73,48,73,74,60 and 78. Was the course useful ? Solution:-
  • 18. Let us take the hypothesis that there is no significant difference in the marks obtained before and after the course. i.e. The course has not been useful. Applying the t- test(difference formula): 𝑡 = 𝑑̅√ 𝑛 𝑆 Participants Before (1st Test) After (2nd Test) 2nd -1st Test 𝑑 𝑑2 A 44 53 +9 81 B 40 38 -2 4 C 61 69 +8 64 D 52 57 +5 25 E 32 46 +14 196 F 44 39 -5 25 G 70 73 +3 9 H 41 48 +7 49 I 67 73 +6 36 J 72 74 +2 4 K 53 60 +7 49 L 72 78 +6 36 ∑𝑑 = 60 ∑𝑑2 = 578 𝑑̅ = ∑𝑑 𝑛 = 60 12 = 5 𝑆 = √∑𝑑2 − 𝑛(𝑑̅) 2 𝑛 − 1 = √ 578 − 12(5)2 12 − 1 = 278 11 = 5.03 𝑡 = 𝑑̅√ 𝑛 𝑆 = 5 × √12 5.03 = 5 × 3.464 5.03 = 3.443 𝑣 = 𝑛 − 1 = 12 − 1 = 11; 𝐹𝑜𝑟 𝑣 = 11, 𝑡0.05 = 2.201 The calculated value of t is greater than the tabulated value. So the hypothesis is rejected. Hence the course has been useful.
  • 19. The F-test or the variance ratio test:- The F-test is named in the honor of the great statistician R.A. Fisher. The object of the F-test is to find out whether the two independent estimates of population variance differ significantly or whether the two samples may be regarded as drawn from the normal populations having the same variance. For carrying one out the test of significance, we calculate the ratio F. F is defined as 𝐹 = 𝑆1 2 𝑆2 2, Where 𝑆1 2 = ∑( 𝑥1− 𝑥̅1)2 𝑛1−1 and 𝑆2 2 = ∑( 𝑥2− 𝑥̅2)2 𝑛2−1 It should be noted that 𝑆1 2 is always the larger estimate of variance. i.e. 𝑆1 2 > 𝑆2 2 . 𝐹 = 𝐿𝑎𝑟𝑔𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑣1 = 𝑛1 − 1 and 𝑣2 = 𝑛2 − 1 𝑣1 = degrees of freedom of the sample having larger variance 𝑣2 = degrees of freedom of the sample having smaller variance The calculated value of F is compared with the tabulated value for 𝑣1 and 𝑣2 at 5% or 1% label of significance. If calculated value of F is greater than the tabulated value then the F ratio is considered significant and the null hypothesis is rejected. On the other hand If calculated value of F is less than the tabulated value then the null hypothesis is accepted and it id inferred that the both the sample have come from the population having the same variance. Since F test is based on the ratio of two variances, it is also called Variance Ratio Test. Example— Two random samples were drawn from two normal populations and their values are A 66 67 75 76 82 84 88 90 92 B 64 66 74 78 82 85 87 92 93 95 97 Test whether the two populations have the same variance at the 5% label of significance (F=3.36) at 5% label for 𝑣1 = 10 and 𝑣2 = 8. Solution—
  • 20. Let us take the hypothesis that the two populations have the same variance. Applying F-test 𝐹 = 𝑆1 2 𝑆2 2 A 𝑋1 (𝑋1 − 𝑋̅1) = 𝑥1 𝑥1 2 B 𝑋2 (𝑋2 − 𝑋̅2) = 𝑥2 𝑥2 2 66 -14 196 64 -19 361 67 -13 169 66 -17 289 75 -5 25 74 -9 81 76 -4 16 78 -5 25 82 +2 4 82 -1 1 84 +4 16 85 +2 4 88 +8 64 87 +4 16 90 +10 100 92 +9 81 92 +12 144 93 +10 100 95 +12 144 97 +14 196 ∑𝑋1 = 720 ∑𝑥1 = 0 ∑𝑥1 2 = 734 ∑𝑋2 = 913 ∑𝑥2 = 0 ∑𝑥2 2 = 1298 𝑋̅1 = ∑𝑋1 𝑛1 = 720 9 = 80; 𝑋̅2 = ∑𝑋2 𝑛2 = 913 11 = 83 𝑆1 2 = ∑( 𝑋1)2 𝑛1 − 1 = 734 9 − 1 = 91.75 𝑆2 2 = ∑( 𝑋1)2 𝑛2 − 1 = 1298 11 − 1 = 129.8 𝐹 = 𝑆1 2 𝑆2 2 = 91.75 129.8 = 0.707 For 𝑣1 = 10 and 𝑣2 = 8. 𝐹0.05 = 3.36. The calculated value of F is less than the tabulated value. So the hypothesis is accepted. Hence it may be calculated that the two populations have same variance. Chi-Square Test:- The χ2 test (pronounced Chi-Square Test) is one of the simplest and most widely used non- parametric tests on statistical test. The symbol χ2 is the Greek later Chi . The χ2 test was first used by Karl Pearson in the year 1900. The quantity χ2 describes the magnitudes of the discrepancy between theory and observations. It is defined as:
  • 21. χ2 = ∑ ( 𝑂 − 𝐸)2 𝐸 Where 𝑂 is the observed frequencies and 𝐸 refers to the expected frequencies. Example:- In an antimalarial complain in a certain area, quinine was administered to 812 persons out of total population of 3248. The number of fever cases is shown below Treatment Fever No fever Total Quinine 20 792 812 No quinine 220 2216 2436 Total 240 3008 3248 Discuss the usefulness of quinine in checking malaria. Solution:-Let us take the hypothesis that quinine is not effective in checking malaria. Applying χ2 test: 𝐸11 = Expectation of (AB) = (𝐴)×(𝐵) 𝑁 = 240×812 3248 = 60 Expecting the frequency corresponding to first row and first column is 60 𝐸12 = 3008 × 812 3248 = 752 𝐸21 = 240 × 2436 3248 = 180 𝐸22 = 3008 × 2436 3248 = 2256 The table of the expected frequency shall be: 60 752 812 180 2256 2436 240 3008 3248 𝑂 𝐸 ( 𝑂 − 𝐸)2 (𝑂 − 𝐸)2 /𝐸 20 60 1600 26.667 220 180 1600 8.889 792 752 1600 2.128 2216 2256 1600 0.709 ∑(𝑂 − 𝐸)2 /𝐸 = 38.393
  • 22. χ2 = ∑ ( 𝑂 − 𝐸)2 𝐸 = 38.393 𝑣 = ( 𝑟 − 1)( 𝑐 − 1) = (2 − 1)(2 − 1) = 1 𝑣 = 1, χ2 0.05 = 3.84 The calculated value of χ2 is greater than the tabulated value. So the hypothesis is rejected. Hence quinine is useful in checking malaria. Yates Correction The Yates correction is a correction made to account for the fact that both Pearson’s chi-square test and Mc Nemar’s chi-square test are biased upwards for a 2 x 2 contingency table. An upwards bias gives a larger result than they should be then the Yates correction is usually recommended, especially if the expected cell frequency is below 5. Calculating the Yates Correction In Yates correction, 0.5 is subtracted from the numerical difference between the observed frequencies and expected frequencies. It is just the Chi2 formula with the .5 subtraction: 𝜒2 𝑌𝑎𝑡𝑒𝑠 = ∑ (| 𝑂 − 𝐸| − 0.5)2 𝐸 Arguments for why the Yates Correction should not be used Although some people recommend that you should use the correction only if your expected cell frequency is below 5, others recommend that you don’t use it at all. A large body of research has found that the correction is too strict. Several researchers, including Yates, have used known statistical data to test whether the correction works. If we are using a statistical program like SPSS to calculate the critical chi-square value for a contingency table, the program will usually force you to incorporate the correction. However, knowing that the correction may be too strict allows you to make a judgment call on your data.