Statistical analysis

HYPOTHESIS TESTING & DATA
PROCESSING
By
Suresh Sundar

Data Analysis
Critical examination of the assembled and grouped data for studying
the characteristics of object under study and for determining the
patterns of relationship among the variables relating to it.

Purpose
• Summarizes data into understandable and meaningful forms
• To make exact descriptions
• To identify the causal factors
• To identify the underlying complex phenomena
• To draw reliable inference from the observed data
• To make estimations or generalizations from sample surveys

Types
Descriptive:
Describes the nature of an object under study
Inferential:
Drawing inferences and conclusions from the findings of a research
study

Descriptive analysis:
• It describes the population or characteristics of population under
study.
• It organizes and present data in a meaningful way
• Mean, Median, mode, standard deviation, variance
Example: suppose a pet shop sells cats, dogs and fish and if 100 pets
were sold, out of which 40 were dogs then one description of the data
on pets sold would be that 40% were dogs

Inferential analysis:
• Drawing conclusions about the population based on sample analysis
and observation
• It compares, tests and predicts data
Example: if we want to know the average height of all men in the city
with a population of so may million residents.

Hypothesis
• It is an assumption or a statement that may or may not be true
• In research it is a formal question that has to be resolved
• It is tested on the basis of information obtained from a sample

Hypothesis Testing
• It is a statistical test used to determine whether there is enough
evidence in a sample of data to infer that a certain condition is true
for the entire population
• They are widely used in business and industry for making decisions
Example:
How much rainfall affects plant growth
How an increase in labor affects productivity

Types
Two opposing hypotheses
•Null Hypothesis
Commonly accepted fact that researchers try to nullify
•Alternate Hypothesis
The hypothesis that researcher is trying to prove

Null Hypothesis(Ho)
• It is the statement being tested
• Usually it is the statement of “no effect” or “no difference”
• It proposes that no statistical significance exists between the two
variables in the hypothesis
• It is presumed to be true until statistical evidence nullifies it for
alternate hypothesis
Example: There is no significant difference/relationship between
advertising budget and sales volume

Alternate Hypothesis(H1)
• Contrary to null hypothesis
• It states that there is a significant difference between the two
variables under study
Example: there is a significant difference/relationship between
advertising budget and sales volume

One-tailed and two tailed tests
One-tailed: If null hypothesis gets rejected when a value of the test
statistic falls in one specified tail of the distribution
Two-tailed: If null hypothesis gets rejected when a value of the test
statistic falls in either one or the other of the two tails of its sampling
distribution

Example
• Consider a soft drink bottling plant which dispenses soft drinks in
bottles of 300 ml capacity. The bottling is done through an automatic
plant. An overfilling of bottle means a huge loss to the company given
the large volume of sales and an under filling means the customers
are getting less than 300ml of drink when they are paying for 300ml.
This could bring bad reputation to the company. Therefore it would
prefer to test the hypothesis whether the mean content of the bottles
is different from 300ml.

Two-tailed/two-sided hypothesis
Ho : µ = 300ml
H1 : µ ≠ 300ml
One-tailed/one-sided hypothesis
Ho : µ = 300ml
H1 : µ > 300ml (or)
H1 : µ < 300ml

Errors
• The acceptance or rejection of a hypothesis is based upon sample
results and there is always a possibility of sample not being
representative of the population.
• This could result in errors as a consequence of which inferences
drawn could be wrong.
Correct
decision
Type 1
error
Type 2
error
Correct
decision
Accept Ho Reject Ho
Ho True
Ho False

Types
Type 1 Error : If the hypothesis Ho is rejected when it is actually true.
It is denoted by α. This is termed as level of significance.
Type 2 Error : If the null hypothesis Ho is accepted when it is actually
false.

Limitations
• It is not decision making itself, but it helps in decision making
• It does not explain the reasons why the difference exist but only
indicate difference is due to fluctuations in sampling or other reasons.
• Tests are based on probabilities and cannot be expressed with full
certainty.
• The inferences based on significance tests cannot be said to be
entirely correct evidence regarding the truth of hypothesis.

Steps in testing of hypothesis
1. Setting up of a hypothesis
2. Setting up of a suitable significance level
3. Determination of a test statistic
4. Determination of critical region
5. Computing the value of test statistic
6. Making decisions

1.Setting up of a hypothesis
• First step is to establish the hypothesis to be tested(assumptions
about the value of the population parameter)
Null Hypothesis(Ho)
Alternate Hypothesis(H1)
• The two hypothesis are formulated in such a way that is one is true
the other is false and vice versa

Criteria for hypothesis formulation
• It should be empirically testable, whether it is right or wrong
• It should be specific and precise
• It should specify the variables between which the relationship is to be
established
• It should describe one issue only
• It must be consistent with known facts

2.Setting a suitable significance level(α)
• Α denotes the probability of rejecting the null hypothesis when it is
true
• It varies from problem to problem, but usually taken as either 5% or
1%
• A 5% level of significance means that there are 5 chances out of 100
that a null hypothesis will get rejected when it should be accepted.
• It means that the researcher is 95% confident that a right decision has
been taken.
• Therefore the confidence with which a researcher rejects or accepts a
null hypothesis depends upon α.

3.Determination of test statistic
• It is a standardized value that is calculated from sample data during
hypothesis testing.
• It compares and measures the degree of agreement between our
sample data with what is expected under null hypothesis.
• The larger the test statistic, the smaller the p-value and the more
likely you are to reject the null hypothesis.

Types of Test statistic
Hypothesis test Test statistic
Z-test Z-score
T-rest T-score
ANOVA F-statistic
Chi-square test Chi-square statistic

4.Determination of critical region
• The area under the sampling distribution curve is divided into two
mutually exclusive regions called acceptance and rejection region.
• The value of test statistic that will lead to the rejection or acceptance
of null hypothesis is called critical region.
• For a significance level of α, the optimal critical region for a two-tailed
test consists of α/2 per cent area in the right and left hand tail of the
distribution.

5.Computing the value of the test statistic
• The next step is to compute the value of the test statistic based on a
random sample of size ‘n’.
• Then we have to examine whether it falls in the critical/rejection
region or acceptance region.

6.Decision making
• If the value of the test statistic falls within the acceptance region then
null hypothesis is accepted and if it falls within the critical region then
it is rejected.
• If the hypothesis is being tested at 5% level of significance, it would
be rejected if the observed values have a probability of less than 5%.
• In that case the difference between sample statistic and the
hypothesized population parameter is considered to be significant
and vice versa.

Example
A sample of 200 bulbs made by a company gives a lifetime mean of
1540 hours with a standard deviation of 42 hours. Is it likely that the
sample has been drawn from a population with a mean lifetime of 1500
hours? You may use 5% level of significance.
Solution:
Sample size n=200
Mean X=1540
Standard Deviation s=42 hrs

Ho : µ = 1500(the bulbs have a mean life of 1500 hrs)
H1 : µ ≠ 1500(the bulbs don’t have a mean life of 1500 hrs)
Z = X-µ
s/√n
Z = 13.47
Standard normal table value is 1.96
Null hypothesis is rejected.

Statistical analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Statistical analysis

Similar to Statistical analysis (20)

More from Suresh Sundar

More from Suresh Sundar (6)

Recently uploaded

Recently uploaded (20)

Statistical analysis