SlideShare une entreprise Scribd logo
1  sur  148
Outline
Statistical inference
 Samplingdistribution andits properties
 Estimation
 Hypothesis testing
 Paired andindependent sample t-tests
 Chi-square test
24 January
2022
1
24 January
2022
2
Objective
At the end you should be able
 Estimate parameters
 Conduct hypothesis testing
 Testthe associationsbetween variables
Inferential Statistics
 It isthe processofgeneralizingor makingconclusionsto the target population
basedon the information obtained from the sample
24 January
2022
3
Inferential Statistics
Notations
 Populationvalues(parameters) are denoted usingGreek letters
 The sample values (statistic) are denoted byroman letters
4
Inferential Statistics process
 Howinformation from the sampleislinked to the population?
5
Sampling Distributions
 The probabilitydistribution ofasamplesstatistic that isformed whenrepeated
sampleswere taken from the whole population
 Ifwetakemany,manysamplesandget the statistic for eachofthose samples,the
distribution ofallthose statistic.
 The frequency distribution ofallthese samplesforms the samplingdistribution of
these sample statistic.
6
7
Sampling Distributions
 Practically repeated samplesdo not taken from the population
 Wedo not encounter samplingdistribution empirically, but it isnecessaryto
knowtheir properties in order to drawstatistical inferences.
 Three thingsthat determine sampling distribution
Its mean
Its variance
Its shape
8
Properties of Sampling Distributions
 The mean of the sample means will be the sameasthe population
mean.
 The standard deviation of the sample means will be equalto the
population standard deviation divided bythe squareroot ofthe sample
size.
 The standard deviation ofthe samplemeanswill be smaller than the standard
deviation ofthe population
 The Standarddeviationofthesamplingdistributionofthesamplestatisticsiscalled
the standarderror
9
Standard deviation vs Standarderror
Standard deviation
 Isameasureofvariability between
individual observations
 Descriptiveindex relevant to mean
Standard error
 Thevariability ofsummary statistics
e.g. the variabilityofthe samplemeanor
asample proportion
 It isameasureofuncertainty in a
sample statistic.
i.e. precision ofthe estimate of the
estimator
10
Sampling Distributions
 Basedon the nature ofsummary statistics
Sampling distribution of the mean
Sampling distribution of the proportion
Properties of sampling distribution of the means
 The mean of the sampling distribution of the means is the same as the
population mean( )
 The SDofthe samplingdistribution ofthe meansis / n.
 The shape of the sampling distribution of means is approximately a normal
curve, regardless of the population distribution when n is large enough
(Central limit theorem).
11
Properties of sampling distribution of the proportions
 The sampleproportion p will be anestimate ofthe population proportions
 The SDofthe samplingdistribution ofthe proportion is
 The shape of the sampling distribution of proportion is approximately a
normal curve, regardless of the population distribution when n is large
enough(Central limit theorem).
12
13
Central LimitTheorem
 Statesthat regardlessofthe shape ofthe parent population distribution;
 Thesamplingdistribution ofanystatisticwill be normal or nearlynormal, ifthe samplesize
islarge enough.
 Butthe question is" how large enough"?
Asarough rule of thumb,
Asamplesizeof30 islargeenoughfor continuous data and
np≥ 5and nq≥ 5for categorical data whichare measuredby proportion
14
Assumptions of statistical inference
 T
o make valid inference or conclusions the following assumptions must be
satisfied
 Samplesmust be randomly selected
 Samplesizemust be large enough
 The population must be normally or approximately normally distributed ifthe
samplesizeislessthan 30.
That meansthe population varianceshouldbe known
 What if n is not large enough and population variance is unknown?
15
Student’s t- distribution
 Weusestudent t distribution in statistical inferencewhichdependson degrees of freedom:
 Thet-distribution isatheoretical probability distribution whichissymmetrical, bell-shaped,
andsimilarto the normal but more spread out.
 Theconditions to usethe student t distribution
Thesampleisfrom anormallydistributed population,
Populationvarianceisunknown, and
Thesample size is small i.e. lessthan30 and np < 5 or nq<5
Student’s t-distributions
24 January
2022
16
T
-table
24 January
2022
17
24 January
2022
18
Student’s t-distributions
 The t distribution andstandard normal distribution are similar in :
It isbell shaped.
 It issymmetrical about the mean.
The mean, median, andmode are equal to 0 andlocated at the center.
The curve never touches the x axis
 The t distribution differs from the normal distribution:
The varianceisgreater than one
The t distribution isbasedon DF,whichisrelated to sample size.
Assamplesizeincrease, the t distribution approachesthe SND(Z).
24 January
2022
19
Parameter Estimations
 Wegenerallyassumethe underlyingdistribution ofthe variableofinterest is
adequatelydescribed byone or more unknown parameters
 Butit isusuallynot possibleto makemeasurements on everyindividualin a
population, parameters cannot usuallybedetermined exactly.
 Instead we estimate parameters by calculating the corresponding
characteristics from arandomsample estimates .
24 January
2022
20
Estimation
 It isaprocedure in whichthe information obtained from asampleare used to
get the true population parameter.
 The processofestimating population parameters byusing samplestatistics
 An estimator is any statistic that is used to estimate unknown population
parameter.
 The valueor valuesthat the estimator assumesare called estimates
24 January
2022
21
Characteristics of good estimator
Anestimator shouldbe:
 Unbiased: the expected valueofthe estimator must be equalto the
parameter to be estimated.
 Consistent: asthe samplesizeincrease, the valueofthe estimator should
approachesto the valueofthe estimated parameter.
 Efficient: thevarianceofthe estimator shouldbe smallest.
 Sufficient: the samplefrom whichthe estimator iscalculatedmust contain the
maximumpossibleinformation about the population.
24 January
2022
22
Estimation
There are two types of estimation:
Point estimation and
Interval estimation
Point Estimation
 A single numerical value is used to estimate the corresponding population
parameter.
 The corresponding point estimator for the parameters:
24 January
2022
23
24 January
2022
24
Point Estimation
However, there are pitfalls of point estimation.
 Different samples end with different estimate for a single unknown population
parameter.
 However,point estimate doesnot take sampleto samplevariability into account.
 Point estimate does not give the precision of the estimate and hence we need
another method ofestimation whichhandlesthese problems.
24 January
2022
25
Interval Estimation
 It isaninterval computed from sampledata containing the true population
parameter within acertain levelof confidence.
 CI=point estimate ± margin oferror (reliability coefficient × StandardError)
 CIconsists of three parts:
The statistic,
Aconfidence level and
Standard error
 Interval estimators are commonlycalled confidence intervals.
24 January
2022
26
Interval Estimation
Level of confidence
 Is the probabilityof obtaining the populationparameter within the error margin.
 Levelofconfidenceisdenoted as(1-α)100%.
 Confidencelevelcannever be 100%!
 Mostcommonly the 95%confidenceintervals are calculated
 However, 90%and99%confidenceintervals are sometimes used
Interval Estimation
ACIin general:
 Considers variationin samplestatisticsfrom sampleto sample
 Basedon observation from one sample
 Givesinformation about closenessto unknownpopulation parameters
 Statedin terms oflevelof confidence
 Interpretation ofconfidenceinterval (e.g. a95% CI)
Ifwetake100 repeated n samplesandconstruct confidenceinterval, weexpect that 95 of
them will contain the true population parameter.
24 January
2022
27
Interval Estimation
Thegeneralformula for allCIs is:
point estimate (measure of how confident we
want to be or reliability coefficient) (standard
error)
The value of the statistic in my sample (e.g., mean, proportion , mean
difference, proportion difference, etc.)
From a Z table or a T table, depending on
the sampling distribution of the statistic.
Standard error
of the statistic.
24 January
2022
28
24 January
2022
29
Error of Margin
 It is the amount added and subtracted to the point estimate in confidence
interval estimation
 It isameasure of precision
 Error margin isaproduct of
Reliability coefficient corresponding to confidence level and
Standard error ofthe estimator.
24 January
2022
30
Interval Estimation
The width ofthe confidence interval depends on:
 Sample size
The larger the samplesize, the narrower the confidence interval andthe
more preciseour estimate. Because as sample sizeincreasesstandard error
decreases.
It isto meanthe samplestatistic will approach the population parameter
 Standard deviation
The more the variation amongthe individualvalues,the wider the
confidence interval andthe lessprecisethe estimate.
24 January
2022
31
Interval Estimation
 Confidence level
Thelarger confidencelevel, the wider the confidence interval
90%CIisnarrower than 95%CIsinceweare only90%certain that the interval
includesthe population parameter.
The99%CIiswider than 95%CI; the extra width meaningthat wecanbe more
certain that the interval willcontain the population parameter.
24 January
2022
32
Interval Estimation
Confidenceinterval canbe estimated for
 Singlepopulation
One population mean
One population proportion
 Double population
Twopopulation(difference) inmean
Twopopulation(difference) inproportion
Estimation for Single
Population
33
CIfor a Single Population Mean
 When the followingassumptionsare fulfilled
Populationstandard deviation () is known
 Population isnormally distributed
 Ifpopulation isnot normal, uselarge sample
 A100(1-)% C.I. for  iscalculated by:
  isto be chosenbythe researcher, most commonvaluesof are
0.05, 0.01 and 0.1.
34
Confidence interval
 Thepoint estimate ofμ isthe samplemean 𝑥
ҧ
 The standard error of𝑥ҧ is 𝛔
ൗ 𝑛
 CommonlyusedCLsare 90%, 95%, and 99%
35
36
Example:
1. W
aiting times (in hours) at a particular hospital are believed to be
approximately normally distributed with a variance of 2.25hr.
a. Asampleof20 outpatients revealedameanwaitingtime of1.52 hours. Calculatethe
point estimate andconstruct the 95% CI.
b. Suppose that the mean of 1.52 hours had resulted from a sample of 32 patients.
Calculatethe point estimate andfindthe 95% CI.
c. What effect doeslarger samplesizehaveon the CI?
Solutions
A.
 Weare 95%confident that the true meanwaitingtime isbetween 0.87 and2.17 hrs.
 Althoughthe true meanmayor maynot be inthis interval, 95%ofthe intervalsformed in this
manner willcontain the true mean.
 Anincorrect interpretation isthat there is95%probability that this interval containsthe true
population mean.
20
1.52.65(.87,2.17)
37
1.521.96
2.25
1.521.96(.33)
Solutions
B.
32
38
 1.52 .53  (.99, 2.05)
 Thelarger the samplesizemakes the CI narrower (more precision).
 When constructing CIs, it hasbeen assumedthat the standard deviation
ofthe underlying population,  , isknown
 What if  isnot known?
1.52 1.96
2.25
 1.52 1.96(.27)
Unknown variance (small sample size, n ≤ 30)
 Ifthe  for the underlying populationisunknownandthe samplesize
is small
 Asanalternative weuseStudent’
st distribution.
39
Degrees of Freedom (df)
 df= Number ofobservations that are allowedto varyfreelyafter the estimator
hadcalculated. df= n-1
40
Example
 Compute a 95% CI for the mean birth weight based on n = 10, sample mean =
116.9 Oz ands =21.70.
From the t table, t (9, 0.975) = 2.262
Answer:(101.4, 132.4)
Interpretations?
CIs for single population proportion, p
 An interval estimate for the population proportion (π) canbe
calculated byaddinganallowancefor uncertainty to the sample
proportion (p)
 Isbasedon three elements of CI.
Point estimate
SEof point estimate
Reliability coefficient
CIs for single population proportion, p
CIs for single population proportion, p
Example
 A random sample of 100 people shows that 25 are left-handed.
Calculate the point estimate and form a 95% CI for the true
proportion of left-handers.
Example
 It was found that 28.1% of 153 cervical-cancer cases had never had a Pap smear prior to the
time of case’s diagnosis. Calculate a 95% CI for the percentage of cervical-cancer cases who
never hadaPap smear.
Estimation for two Populations
CIfor the difference between population means
 Known variances and large sample size
 When 1 and2 are knownandboth populationsare normal or both samplesizesare at least
30
 Thetest statistic isa z-value
 The point estimation of (μ1- μ2) is(𝑥
1
ҧ − 𝑥
2
ҧ )
 Thestandard error is (
𝑥
1ҧ − 𝑥ҧ2
)
 Finally,
 Ifpopulation variancesare unknown, theycanbe approximatedbythe samplevariances:𝑠1
2
and𝑠2
2 whenthe Sample islarge (n≥ 30)
Example 1
• Researchers wishto knowifthe data they havecollected provide sufficient
evidence to indicate adifference in mean serum uric acidlevels between
normal individualsandindividualswith mongolism.The data consist of
serum uric acidreadings on 12 mongoloid individualsand15 normal
individuals.The meansare 𝑥ҧ1= 4.5 mg/100 ml and𝑥ҧ2= 3.4 mg/100
m
l
.The data constitute two independent simple random samples each
drawn from anormally distributed population with avarianceequal to 1
mg/100 ml.
• Compute the point estimate andconstruct a95%CIfor the difference in
meanserum uric acidlevels between the two populations.
Example 1
Example 2
 Researchers are interested in the difference between serum uric acid levels in patients
with and without Down’ssyndrome.
 Patientswithout Down’s syndrome
n=12, samplemean=4.5 mg/100ml,2=1.0
 Patientswith Down’s syndrome
n=15, samplemean=3.4 mg/100ml,2=1.5
 Calculate the 95% CI.
SE= 0.43, 95% CI = 1.1 ± 1.96 (0.43) = (0.26, 1.94)
 Weare 95%confident that the true differencebetween the two population meansis between
0.26 and 1.94.
CIfor the difference between population means
UnknownVariances (σ1
2and σ2
2) and small sample size (n < 30)
 Ifthe followingassumptions satisfied
The two random samplesare independent
Bothsamplesare pickedfrom population with normal distribution.
The population variancesare unknownbut are assumedto be equal.
 the test statistic isat-value with degrees offreedom = 𝑛1 + 𝑛2-2
 The point estimation of(μ1- μ2) is (𝑥1
ҧ− 𝑥ҧ2)
 The standard error is (𝑥1
ҧ− 𝑥
2
ҧ )=
CIfor the difference between population means
 Thepooled samplevariance (S2)
 Finally,(1- α) 100% confidence interval for (μ1-
μ2):
Example
 Aresearch team collected serum amylasedata from asampleofhealthy
subjects andfrom asampleofhospitalized subjects.They wishto knowif
they wouldbe justified in concluding that the population meansare
different.The data consist ofserum amylasedeterminations on 𝑛2=15
healthy subjects and 𝑛1=22 hospitalized subjects.The samplemeansand
standard deviations are as follows:
𝑥ҧ1= 120 units/ml, 𝑠1=40 units/ml
𝑥ҧ2= 96 units/ml, 𝑠2=35 units/ml
 Construct a95%CIfor the difference between the two population mean
serum amylase.
Example
 Calculate the pooled variance
S2
 Calculate the 95%confidence interval
 95%CI
CIfor the difference between populationproportions
 Supposethat n1andn2are largeenoughso that;
– 𝑛1𝑝1≥5,𝑛1(1 − 𝑝1)≥5,𝑛2𝑝2≥5,and 𝑛1(1 − 𝑝1)≥5
 Thepoint estimate for the differenceoftwo population proportion, 𝜋1− 𝜋2isby𝑃1− 𝑃2.
1 2
𝑃1(1−𝑃1)
+ 𝑃2(1−𝑃2)
𝑛1 𝑛2
 The standard deviation 𝑃 − 𝑃=
 A(1-α)100% confidenceinterval estimate for the differenceofpopulation proportions, 𝑃1−
𝑃2= 𝑃1− 𝑃2± 𝑧𝛼
Τ2 𝑛
+
𝑃1(1−𝑃1) 𝑃2(1−𝑃2)
𝑛
1 2
Example
 Each of two groups consists of 100 patients who have leukemia. Anew
drug is given to the first group but not to the second (the control
group). It is found that in the first group 75 people have remission for
2 years; but only 60 in the second group. Find 95% confidence limits
for the difference in the proportion of all patients with leukemia who
haveremissionfor 2 years.
Example
 𝑝1= 0.75, 𝑞1= 0.25, 𝑛1=100; 𝑝2= 0.60, 𝑞2= 0.40, 𝑛2=100
 𝑛1𝑝1=75>5 and 𝑛1𝑞1= 25>5
 𝑛2𝑝2=60>5 and𝑛2𝑝2=40>5
 𝜎1
2 = 1 1
= 0.001875 and𝜎2
2
𝑝 𝑞 𝑝 𝑞
2 2
𝑛1 𝑛2
= = 0.0024
 Hence, the 𝜎2for 𝑝1− 𝑝2= 0.001875+ 0.0024 = 0.004275
 𝜎 for 𝑝1 − 𝑝2= 0.004275 = 0.0653
 At a 95% Confidence level, Z = ± 1.96; 𝑝2− 𝑝1= 0.75 - 0.60 = 0.15
 Therefore, 95 %C.I. =(0.15±1.96(0.065))= (0.15 ± 0.13)=(0.02,0.28).
Summary
Is σ
known?
Is n ≥ 30 or np and nq≥5
Use tα/2 values and s in the formula.
ye
s
ye
s
Use zα/2 values
no maters what the sample size is
Use zα/2 values and
s in place of σ in the formula.
N
o
N
o
• When to usetα/2 or zα/2 for findingconfidenceinterval
Hypothesis testing
HypothesisTesting
 Researchers are interested to conduct a study for answering many research
questions/hypothesis.
 The best wayto determine whether their hypothesisistrue wouldbe to examine
the entire population.
 Butit isoften impractical, researchers typicallyexamine arandomsamplefrom
the population.
 The purpose ofthe anystudy isto collect datawhichwill allowthe researcher to
test the hypothesisor answertheir question.
 Statistical tests canprove(with acertain degree ofconfidence), that ahypothesis
are true or not.
HypothesisTesting
 Inhypothesistesting:-the researcher must definethe population under study,
-state the particular hypothesisthat will be investigated,
-Determine significance level,
-select asamplefrom the population and collect the data, and
-perform the appropriate statisticaltest andreacha conclusion.
Hypothesis Testing
 Hypothesis is a testable statement that describes the nature proposed
relationship between two or more variablesof interest.
 Hypothesisare formulated, experiments are performed, andresults are evaluated
for their consistency with a hypothesis.
 HypothesisTesting(HT) providesanobjectiveframework for makingdecisions
usingprobabilistic methods
 The purpose ofHTisto aidthe clinician, researcher or administrator in reaching
adecision (conclusion).
Types of Hypothesis
 The Null Hypothesis, H0
 Isastatement claimingthat there isno difference between the hypothesizedvalue
andthe population value(parameter= hypothesized value)
 It isastatement ofagreement (no difference)(no difference between groupsor
the intervention isnot effective)
 Statesthe assumption (hypothesis) to be tested
 It isalwaysabout apopulation parameter (mean, proportion, OR, RR, etc.),
not about asample statistic
 Alwayscontains“=” , “ ≤” or“≥ ” sign
 Mayor maynot be rejected
Types of Hypothesis
TheAlternative Hypothesis,HA
 It isastatement wewillbelieveastrue ifwereject the H0.
 It isgenerally the hypothesisthat isbelieved(or needsto be supported) bythe
researcher.
 Is a statement that disagrees (opposes) with H0 (there is difference between
groupsor the intervention effective)
 Never contains“=” , “ ≤” or “≥ ” sign,it contains“≠”,“>”, or”<“
 May or maynot beaccepted
Rules for Stating Statistical Hypotheses
 Indicationof equality(either =, ≤ or ≥) mustappearinH0.
H0 : μ = μo, HA: μ ≠ μo; when our hypothesis is expressed in terms of population mean
H0: P= Po, HA: P≠ Po; when our hypothesisisexpressed interms ofpopulationproportion
 Canweconcludethat acertain populationmean is
 not 50?;H0: μ = 50 andHA: μ ≠50
 greater than 50?; H0: μ ≤ 50 andHA: μ > 50
 Canweconcludethat the proportion ofpatients with leukemiawhosurvivemore than six years
isnot 60%?
HA: P= 0.6 and HA: P≠0.6
 Canweconcludedissmokingissignificantlyassociatedwith lungcancer
H0: there isno associationbetween smokingandlung cancer.
HA:there isanassociationbetween smokingandlung cancer
Hypothesis testing process
 Nowthink about howthe hypothesistest shouldbe carried out
 Wedrawarandom sampleofsizenfrom the underlying population and
calculateits samplemean (𝑥ҧ)
 Wecompare(𝑥ҧ)to the postulated mean μ0
 Is the difference between (𝑥ҧ) and μ0 too large to
be attributed to chance alone?
HypothesisTesting Process
Steps in HypothesisTesting
1. Formulatethe appropriate statisticalhypotheses clearly
SpecifyH0and HA
H0:  = 0 H0:  ≤0 H0:  ≥0
HA:   0 HA:  > 0 HA:  < 0
two-tailed one-tailed one-tailed
2. Decide on the appropriate test statistic for the hypothesis. E.g., one
population
or
Steps in HypothesisTesting
3. Specifythe desired levelofsignificance(=0.05, 0.01, etc.)
4. Determine the critical value.
5. Compute the test statistic or the p-value
6. Reachadecisionanddrawthe conclusion
 IfH0isrejected,weconcludethatHAistrue(oraccepted).
 IfH0isnotrejected,weconcludethatHomaybetrue.
One tail and two tailtests
 Depend on the waythe H0iswritten, hypothesistesting canbe:
 Twotail test
 Therejection region issplit into the two tails.
 Alternative hypothesistakestheform ”differentfrom”.
 One tail test
 Therejection region isat one end ofthe distribution or the other.
 Alternative hypothesistakesthe form ”lessthan”or ”greater than”.
Level of Significance, α
 Isthe probabilityofrejecting atrue H0
 Definesrejection region ofthe sampling distribution
 The decisionismadeon the basisofthe levelofsignificance,designated byα.
 More frequently used valuesofα are 0.01, 0.05 and 0.10.
 α isselected bythe researcher at the beginning
Test statistic
 Anyobserveddifferences or associationsmayhaveoccurred bychance.
 Becausethere israndomvariation, evenanunbiasedsamplemaynot accurately
represent the population asa whole.
 Atest statisticsisavaluewecancompare with knowndistribution ofwhatwe
expect when the null hypothesisis true.
 The general formula of any test statisticsis:
𝒐
𝒃
𝒔
𝒆
𝒓
𝒃
𝒆
𝒅𝒗
𝒂
𝒍
𝒖
𝒆
−
𝒉
𝒚
𝒑
𝒐
𝒕
𝒆
𝒔
𝒊
𝒛
𝒆
𝒅
𝒗
𝒂
𝒍
𝒖
𝒓
𝒔
𝒕
𝒂
𝒏
𝒅
𝒂
𝒓
𝒅𝒆
𝒓
𝒓
𝒐
𝒓
 Anexampleofatest statistic isz-test , t-test, X2-test
Critical value
 The valuethat separates the rejection region from the acceptance region for a
givenlevelof significance
 The valuesofthe test statistic assumethe points on the horizontal axisofthe
normal distribution andseparatestwo regions:
 Rejection region, and
Non-rejection region.
 Thevaluesofthe test statistic forming the rejection region are lesslikelyto occur ifthe H0is
true.
 Thevaluesmakingthe acceptance(non-rejection) region are more likelyto occur ifthe H0 is
true.
Rejection and Non-Rejection Regions
Rejection region Non-rejection region Rejection region
= 0.025 = 0.025
0.95
1.96
-1.96
P-value
 Inmost applications, the outcome ofperforming ahypothesistest isto produce a
p-value.
 P-valueisthe probabilityofobtainingatest statistic asextreme or more extreme
valuethan the actual test statisticobtained if the H0 is true
• P-valueisthe probabilitythat the observeddifference isdue to chance.
 The larger the test statistic, the smaller is the P
-value, the value observed
occurring just bychanceis low.
 The smaller the P-valuethe stronger the evidencefor rejecting H0 .
Reject H0 ifP-value< α
AcceptH0 ifP-value> α
What ifP-value =α??????
How to calculateP-value
 Usestatistical software likeSPSS, SAS,STA
TA, or R, etc.
 Manual calculations
Obtained from the test statistics (Z calculated)
Findthe probability oftest statistics from standard normal table
Subtract the probability from 0.5
Ifthe test two tailed multiply 2 the result.
Statistical Decision
 Basedon the computation from the data ofthe sample
 The decision to reject or not to reject the Ho isbased on
The magnitude ofthe test statistic.
CI
P-value
 Reject Ho ifthe valueofthe test statistic in the rejection region
 Don’t reject Ho ifthe computed valueofthe test statistic isone ofthe valuesin
the non-rejection region.
Errors in hypothesis testing
 Whenever wereject or accept the H0 wecommit errors.
 Twotypes oferrors are committed.
TypeI Error
TypeII Error
TypeI Error
 Theerror committed whenatrue H0is rejected
 Considered aserious type of error
 The probability ofatype Ierror isthe probabilityofrejecting the H0
whenit is true
 The probability oftype Ierror isα, Called levelofsignificanceofthe
test
 Setbyresearcher in advance
TypeII Error
 Theerror committed whena false H0 isnot rejected
 The probability ofTypeIIError is 
 Usuallyunknownbut larger than α
Power
 Theprobability ofrejecting the H0 whenit is false.
 Power= 1 – β = 1- probability oftype IIerror
 Wewouldliketo maintainlowprobability ofatype Ierror (α) and low
probability ofatype IIerror (β) [highpower = 1 - β].
Summary
Decision
(Conclusion)
Reality
H0 True H0 False
Do not
reject Ho
Correct action
(Prob. = 1-α)
Type II error (β)
(Prob. = β= 1-Power)
Reject Ho Type I error (α)
(Prob. = α = Sign. level)
Correct action
(Prob. = Power = 1-β)
Summary
Example
HO =there isno pregnancy;HA= there ispregnancy
TypeI &II Error Relationship
Factors AffectingTypeII Error
Factors Affecting the Power of theTest
 The power depends on:
1. Asn↑, power↑
2. As|µ1-µo|↑, power↑
3. As↑, power↓
4. Asα↓, power↓
Hypothesis Test for OneSample
 Test for single mean
 Test for single proportion
HypothesisTesting of a Single Mean
(Normally Distributed)
HypothesisTesting for KnownVariance
 Twotailed test
H 0 : 0
H A : 1
 0


n
 z for two tailed test
2
cal tab
if | zcal | ztab do not reject H o
if | z | z reject H o
ztabulated
cal
Decision :
z


x  0
Example
 Asimplerandomsampleof10 peoplefrom a certain population hasa meanage of 27. Canwe
The variance is known to be
conclude that the mean age of the population is not 30?
20. Let α = .05.
Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
Assumptions
Simplerandom sample
Normallydistributed population
Example
A. Hypothesis
Ho: µ= 30
HA: µ≠ 30
B.Test statistic
Asthe population varianceisknown, weuseZ asthe test statistic
C. Determine the levelof significance
Example
D. Determine the criticalvalue
 Reject Ho ifthe Z valuefallsinthe rejection region.
 Don’t rejectHo if theZ valuefallsin the non-rejection region.
 Becauseofthe structure ofHo it isatwo tail test. Therefore, reject Ho ifZ ≤ -1.96 or Z ≥
1.96.
Example
E.Calculation of test statistic or computeCI
F
.Statistical decision
Wereject the HobecauseZ = -2.12 isinthe rejection region.Atα of 5%.
Conclusion
Weconcludethat µisnot 30. P-value= 0.0340
AZ value of -2.12 correspondsto an area of0.0170. Sincethere are two parts to the rejection region in a
two tail test,the P-value is twice this which is .0340.
  2 . 1 2
2 7  3 0   3
1 0
2 0 1 . 4 1 4 2
x  0

z 
n
c a l
Hypothesis testing using confidenceinterval
 Aproblem like the above example can also be solved using aconfidence interval.
 A confidence interval will show that the calculated value of Z does not fall within the
boundaries ofthe interval. However,it willnot givea probability.
 Confidence interval
 27 1.96(1.4142)
 (24.228,29.772)
n
CI  x  z


2
HypothesisTesting for KnownVariance
 One tailed test
𝐻0 ∶ 𝜇≥ 𝜇𝑂
𝐻𝐴 ∶ 𝜇< 𝜇𝑂
Example
 A simple random sample of 10 people from a certain population has a mean age of 27. Can
we conclude that the mean age of the population is less than 30? The variance is known to be
20. Let α = 0.05.
 Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
 Hypotheses
Ho: µ ≥ 30, HA: µ < 30
Example
 Test
statistic
e have the entire rejection
region at
the left. The critical value will
be Z
 With α = 0.05 and the inequality,
w
= -1.645. Reject Ho if Z < -
1.645.
=
 Rejection Region
Lower tail test
Example
• Statistical decision
– We reject the Ho because -2.12 < -1.645.
• Conclusion
– We conclude that µ < 30.
– p = .0170 this time because it is only a one tail test and not a
two tail test.
Hypothesis testing for unknown variance (nsmall)
 Inmost practical applicationsthe standard deviationofthe underlying
population isnot known
 Inthis case,  canbe estimated bythe samplestandard deviation s.
 Ifthe underlying population isnormallydistributed, then the test
statistic is:
Example
 Asimplerandom sampleof14 people from acertain population givesasamplemeanbody mass
index (BMI)of30.5 andSDof10.64. Canweconcludethat the BMIisnot 35 at α 5%?
 Ho: µ= 35, HA:µ≠35
 Test statistic
 Ifthe assumptionsare correct andHo istrue, the test statistic followsStudent's t distribution
with 13 degrees of freedom.
Example
 Decision rule
 Wehaveatwo tailed test. With α = 0.05 it meansthat eachtail is0.025.The critical t valueswith
13 dfare -2.1604 and 2.1604.
 Wereject Ho ifthe t ≤ -2.1604 or t ≥ 2.1604.
 Dono
possib
in the rejection
5
t reject Ho because-1.58 isnot
le that µ= 35. P-value= 0.137
region. Basedon the dataoft hesample, it is
Hypothesis testing for proportions
 Involvescategorical values
 Twopossibleoutcomes
 “Success”(possessesacertaincharacteristic)
 “Failure”(doesnot possessesthatcharacteristic)
 Fractionor proportion of populationin the“success”categoryis
denoted by p
Hypothesis testing for proportions
t−test atn −1 df
Example
 We are interested in the probability of developing asthma over a given one-year period for
children 0 to 4 years of age whose mothers smoke in the home. In the general population of 0
to 4-year-olds, the annual incidence ofasthma is1.4%. If10 casesofasthmaare observedover a
single year in a sample of 500 children whose mothers smoke, can we conclude that this is
different from the underlyingprobability ofp0= 0.014?Α = 5%
H0 : p = 0.014
HA: p ≠ 0.014
Example
• The test statistic is given
by:
Example
 Thecritical valueofZα/2 at α=5% is±1.96.
 Don’t rejectHosinceZ(=1.14) in the non-rejection region between
±1.96.
 P-value = 0.2548
 We do not have sufficient evidence to conclude that the probability of
developing asthma for children whose mothers smoke in the home is
different from the probability in the general population
Hypothesis testing for two samples
 ComparingTwo Population Means;
 Independent samples: variancesknown
 Independent samples: variancesunknown
• Paired Difference Experiments
 Paired/matched/repeated sampling
• ComparingTwo Population Proportions
 Large,independent samplescase
Hypothesis testing for two populationmeans
 Independent sample with known variance or both groups have large sample size
Thesteps to test the hypothesisfor differenceofmeansisthe samewith the singlemean
Step1: state the hypothesis
Ho: µ1-µ2 =0 vsHA: µ1-µ2≠0, HA: µ1-µ2<0, HA: µ1-µ2 >0
Step2: Significancelevel(α)
Step3:Test statistic
n1 n2
2
2
1
   2

( x  y )  (1 2)
zc al
Hypothesis testing for two populationmeans


if zcal  ztab
cal  ztab




A
cal
do not reject Ho
reject Ho
do not reject Ho
if zcal  zcal
if z  zt a b reject Ho
: 1  2 0
A
cal cal
if | zcal | zcal do not reject Ho
if | z | reject Ho
A
ztabulated
ztabulated
if z
: 1  2 0
For H
For H
For H 2
1
:     0
 z for two tailed test
2
 z for one tailed test
 z
Example
• Aresearchers wishto knowifthe datathey havecollected providesufficientevidenceto indicate
a difference in mean serum uric acid levels between normal individual and individual with
down’s syndrome. The data consists of serum uric acid readings on 12 individuals with down’s
syndrome and 15 normal individuals. The means are 4.5mg/100ml and 3.4 mg/100ml with
population standard deviationof2.9 and3.5 mg/100ml respectively.
HO : 1  2  0
H A : 1  2  0
SOLUTION
 The
 1 . 9 6
1 .2 3
1 .5 1 7 8
1 2 1 5
1 . 6
2 . 9 2
3 . 5 2

1 . 6
5 . 3 3
 z0 . 0 2 5
z 
2
1
   2
n1 n 2
2
 2


z ca l


( x  y )  ( 1   2 )

( 4 . 3  3 . 4 )  0
114
Hypothesis testing for two populationmeans
Independent Samples,variancesunknown
 Generally, in most ofthe real lifesituations, the true valuesofthe population
variances 𝜎1
2 and 𝜎2
2are notknown.
 Theyhave to be estimated from samplevariance 𝑆1
2
and 𝑆2
2
,respectively.
 Alsoneed to estimate the standard deviation ofthe samplingdistributions ofthe
differencein means (
𝑋
ത
1
-
𝑋
ത
2
)
 Twoapproach's
1.The varianceofthe two populationsare assumedto be equal
2.The varianceofthe two groupsare assumedto be not equal
Hypothesis testing for two populationmeans
 Assumed that the unknownvariances are equal; 𝝈𝟏
𝟐=𝝈𝟐
𝟐=𝝈𝟐
 Thepooled estimate of𝜎2isthe weightedaverageofthe two sample
variances,𝑆1
2
𝑎𝑛𝑑𝑆2
2
 Thepooled estimate ofisdenoted by𝑆𝑝
2
 Standarddeviationofthe samplingdistribution is;
𝑠
𝑥
1
ҧ−𝑥2ҧ = 𝑝
𝑛1 𝑛2
𝑆 2
(
1
+
1
)
115
Hypothesis testing for two populationmeans
 The t-statistic will be
used
𝑡=
𝑥ҧ1−𝑥ҧ2 −(𝜇
1−𝜇2)
116
𝑆𝑝
2( 1
+ 1
)
𝑛1 𝑛2
 The df = 𝑛1+ 𝑛2− 2
Hypothesis testing for two populationmeans
𝑠𝑥ҧ1−𝑥ҧ2
=
𝑛
Assumethat the unknownvariances are not equal;𝝈𝟏
𝟐 ≠ 𝝈𝟐
𝟐
 The 𝜎1
2 and 𝜎2
2will be estimated by𝑆1
2
𝑎𝑛𝑑𝑆2
2
, respectively
 Standarddeviationofthe samplingdistribution is;
𝑆1
2
𝑆2
2
117
1 2
( + )
𝑛
 Howto compute the dfwhenthe unknownvariancesandassumednot
to be equal?(reading assignment)
Example
 We have 20 subjects, all males between the ages 25 and 35 who volunteer for our experiment.
One half of the group will be given coffee containing caffeine; the other half will be given
decaffeinated coffee as the placebo control. We measure the pulse rate after the subjects drink
their coffee.The results are:
A) Testthe hypothesis that caffeinehasno effect on the pulse rates ofyoungmen byassuming both
groups hadequalvariance?(α = .05)
B) Findthe 95%C.I. for the population mean difference.
118
119
SOLUTION
 Hypotheses:Ho : μt = μc
HA: μt ≠μc
 where, μt = population meanoftreatment group, μc = population meanofcontrol (placebo)
group.
Compute the pooled(combined) varianceofboth groups
S2= { (10-1)x 28.67+ (10-1) x 31.11 } / 18
= (258.03 + 279.99)/18 = 538.02 / 18 = 29.89
Therefore,t calc = (75 - 68) / √ 29.89(1/10 + 1/10 ) = 7/ √ 5.978
= 2.86 (Thiscorresponds to aP-valueoflessthan 0.02)
t tab ( α = 0.05 , df = 18 ) = 2.10, t calc> t tab ⇒rejectHo
• Hence, caffeinatedcoffeehasaneffect on the pulserates ofyoung men.
120
Hypothesis testing for two population means
Dependent/paired/matched/repeated sampling
 Risesfrom two differentprocesseson same studyunits (e.g. "before” and“after”
treatments)
 Use of the same/matched individuals, eliminates any differences in the
individualsthemselves(confounding factors).
 Inference concerning the differencebetween two population meansissimilarto
one population mean; except that wemanipulateon the difference here.
Hypothesis testing for two population means
121
Hypothesis testing for two population means
 Ifthe populationofdifferencesisnormally distributed with mean𝜇𝑑
 Thetest statistic =𝑑ത−𝑑𝑂
𝑆𝑑
ൗ 𝑛
 Thetest statistic= Z-test ifthe samplesizeislarge(n1&n2>30) or
varianceis known.
 The test statistic= t-test ifthe sample size is not largeenoughand
unknown variance
 A(1-α)100% confidence interval for µd= µ1- µ2is:
ҧ
𝑑±𝑧 Τ
𝑎
2
𝑎
2
𝑜
𝑟𝑡( Τ, df) ൗ
𝑆𝑑
𝑛
122
123
Example
 SerumCholesterol Levelsfor 12 SubjectsBeforeandAfter Diet-Exercise Program
Subject
Serum Cholesterol Difference
(after-before)
Before (x1) After (x2)
1 201 200 -1
2 231 236 +5
3 221 216 -5
4 260 233 -27
5 228 224 -4
6 237 216 -21
7 326 296 -30
8 235 195 -40
9 240 207 -33
10 267 247 -20
11 284 210 -74
12 201 209 +8
Solution

1 5 ...  8 
 242
 20.17
n 12 12
d   di

1 2 11
124
1 2 1 0 7 6 6    2 4 2 2
2
 5 3 5 . 0 6
i i i
d
s 2
n d 2
 d
 
n  1 n n 1 
 d  d 2

 
1. State the hypothesis
Ho: The mean difference between before and after diet-
exercise- program is  zero
HA: The mean difference between before and after diet-
exercise-
program is < zero
Solution
2. Select the appropriate test statistic
3. Select the level of significance = 0.05
4. Determine the critical ratio or critical value of t test = - 1.7959
5. Perform the calculation for the test statistic
t 
 20.17  0

 20.17
 3.02
• Reject Ho since - 3.02 < - 1.7959
• Conclude that the diet-exercise program is effective.
535.06 12
6. Draw and state the conclusion
6.68
125
Hypothesis testing for two populationproportions
 Supposethat n1andn2are largeenoughso that;
– 𝑛1𝑝1≥5,𝑛1(1 − 𝑝1)≥5,𝑛2𝑝2≥5,and 𝑛1(1 − 𝑝1)≥5
1 2
𝑃1(1−𝑃1)
+ 𝑃2(1−𝑃2)
𝑛1 𝑛2
 The standard deviation 𝑃 − 𝑃=
 Thetest statistic could be
𝑍
𝑐
𝑎
𝑙=
(𝑃1−𝑃2)−(𝜋1−𝜋2)
+
𝑃1(1−𝑃1) 𝑃2(1−𝑃2)
126
𝑛1 𝑛2
 What if the sample size issmall?
 weuse t-statistic with df of 𝑛1+ 𝑛2− 2
127
Example
 Aresearcher is trying to study the malaria situation of Ethiopia. From the records of seasonal
blood survey (SBS) results he came to understand that the proportion of people having malaria
in Ethiopia was 3.8% in 2019 (Eth. Cal). The size of the sample considered was 15000. He also
realized that during the year that followed (2020), blood samples were taken from 10,000
randomly selected persons. The result of the 2020 seasonal blood survey showed that 200
persons were positivefor malaria.
 Doesthe researcher concludethat the malariasituationof2020 did not showanysignificant
differencefrom that of2021 (take the levelofsignificance,α =.01).
Solution
HO : P2019= P2020( or P2019- P2020= 0 ); HA: P2019≠ P2020( or P2019- P2020≠ 0 )
P2019= 0.038 , P2019= 15,000
 P2020= 0.02 , P2020= 10,000
 Z tab ( α = 0.01 ) = 2.58 (two tail)
1 5 , 0 0 0 1 0 , 0 0 0
 Zcalc= 8.2,Which corresponds to aP-valueoflessthan 0.003.
 Decision: reject Ho (because Zcal > Z tab); in other words, the p-value is less
than the level of significance (i.e., α = 0.01)
128
0.038(1  0.038)

0 . 0 2 ( 1  0 . 0 2 )
( 0 . 0 3 8  0 . 0 2 )  (0)

zc a l
129
Example
 Astudy wasconducted to look at the effects of oral contraceptives (OC) on heart
disease in women 40–44 years of age. It is found that among n1 = 500 current
OC users, 13 develop a myocardial infarction (MI) over a three-year period,
while among n2 = 1000 non- OC users, seven develop a MI over a three-year
period. Then can you conclude that rate of MIis significantly greater among OC
users?(Report the P-valuefor your test)
130
Measures of Association
 While a test of hypothesis can be used to determine whether an
association exists between two random variables, it cannot provide a
measureofthe strength ofthe association
• Several methods are available for estimating the magnitude of the effect
giventhe categoricaldatain a2× 2 contingency table
1. Chi-SquareTest
2. Relative Risk (RR)
3. Odds Ratio (OR)
131
Chi-SquareTest
 AChi-Square (χ2) is a probability distribution used to make statistical
inferences about categorical data (proportions) in which the numbers
ofcategories are two or more.
 Widelyusedin the analysisofcontingency tables.
 Chi-Square test allows us to test for association between two
categorical variables.
 Ho: No associationbetween the variables;HA:There is association
 Consequently asignificantp-valueimplies association.
X2 Distribution
 Indexedbythe degrees offreedom (n)
 Unlike z and t distributions, which are always symmetric about 0, the
X2distribution only takes on positive values and is alwaysskewed to the
right.
 The skewnessdiminishesasn increases
18.307 2
1 0
0.05
A cceptance
region
0,95
R ejection
region
132
133
X2 Distribution
 Ast distributions, there isadifferent X2distribution for eachpossiblevalueof df.
 X2distributions with asmallnumber ofdfare highlyskewed;however,this
skewnessisattenuated asthe number ofdfincreases.
 The dfdistribution isconcentrated overnonnegative values.
 It hasmeanequalto its degrees offreedom (df), andits standard deviation equals
√(2df ).
 Asdfincreases, the distribution concentrates around larger valuesandismore
spread out.
 The distribution isskewedto the right, but it becomesmore bell-shaped
(normal) asdf increases
X2 Distribution
 Asdfincreasesit becomesmore bell-shaped (normal)
134
Chi-Square test
 It isastatistic whichmeasuresthe discrepancybetween kobservedfrequencies
O1, O2,…Ok andthe corresponding expected frequencies E1, E2,… Ek.
 Ifthe valueofχ2 iszero, no discrepancybetween the observedandthe expected
frequencies.
 The greater the discrepancy,the larger willbe the valueof χ2.
 The calculatedvalueofχ2 iscompared with the tabulated value for the givendf.
• Chi-Squaretest isbasedon the table ofΧ2 for df. ForRx Ctable the dfisgiven
by: (row-1)(columon-1) or (R-1)(C-1)
135
Chi-SquareTest
 Counts in the Chi-SquareTestofa2x2 tablearerepresentedas“a”, “b”,
“c” and“d”.
 Thegeneral formula
for 2x 2 table.
nadbc2
 We canalso use
2
(ac)(bd)(ab)(cd)
136

Chi-SquareTest
ExpectedValue
 Isthe product ofthe row total multiplied bythe column total, divided
bythe grand total
 The expected numbers must be computed for each cell.
137
Chi-SquareTest
 Assumptions
 Datamust be categorical
 The data shouldbe afrequency data
 Thenumbersin eachcell are‘not too small’. No expected frequency = zero
 No more than 20% of the expected frequenciesshouldbe lessthan 5.
 Ifthis does not hold
 combined(re-categorized) row or columnvariablescategories to makethe expected
frequencieslarger or
 useYatescontinuity correction
138
139
Example
 A study was conducted to investigate the possible cause of
gastroenteritis outbreak following a lunch served in a high school
cafeteria. Among the 225 students who ate the sandwiches, 109
became ill. While, among the 38 students who did not eat the
sandwiches,4 became ill.
 Present the data by2x2 contingency table
Example
 With this method, dataare arranged in the form ofacontingency
table
 Thisisa2 × 2 table for two dichotomous random variables
140
Solution
 We again wish to know whether the proportions of students
who becameill in eachofthe groupsare identical
 Tocarry out the test, wefirst calculate the expected counts for the
table assuming that:
H0: p1 = p2
HA: p1 ≠p2
141
Example
 The chi-square test compares the observed frequencies in
each categorywith the expected frequencies giventhat H0is true
 Are the deviations between Observed and Expected too large to
be attributed to chance?
 Todetermine this, deviationsfrom all4 cellsmust be combined
 Calculate the sum:
142
143
Example
 TheHo isrejected at α levelifX2istoo large, in particular, ifX2>
X21,α
 If α = 0.05, wewouldreject H0for X2greater than X21,α = 3.84
 Therefore, wereject the Ho
 The p-valueisgivenbythe area under the X2distribution to the right
of X2
 P-value< 0.001
144
Example
 Astudy was conducted to look at the effects of oral contraceptives (OC) on heart
disease in women 40 to 44 years of age. It is found that among 5000 current OC
users at baseline, 13 women develop a myocardial infarction (MI) over a 3-year
period, where as among 10,000 non-OC users, 7 develop an MI over a 3-year
period. Compare the relation between Chi-SquareTestandz-test ?
– P1= 0.0026, P2= 0.0007
– Z-test = 2.77, P-value= 0.006
– There isahighlysignificantassociationbetweenMIandOC use
145
Solution
 Displaythe abovedatain the form ofa2x2 contingency table
OC-use group
MI statusover
3 years
Total
Yes No
OC users 13 4987 5000
Non-OC users 7 9993 10,000
Total 20 14,980 15,000
Isthe proportion ofMIthe samein OC users andnon-OC users?
What canbe saidabout the relationship between MIstatus andOC use?
Solutions
 Compute the expected frequencyfor the OC-MI data
 Relationship betwe
X2 ≈ 8, 0.001<p-value < 0.005
en X2andZtest isX2= Z2
146
147
Summary
1. Everyχ2 distribution extends indefinitely to the right from 0.
2. Everyχ2 distribution hasonlyone (right ) tail.
3. Asdfincreases, the χ2 curves get more bell shapedandapproach the normal
curve in appearance (but remember that a chi square curve starts at 0, not at -
∞)
4. If the value of χ2 is zero, then there is aperfect agreement between the observed
and the expected frequencies. The greater the discrepancy between the
observedandexpected frequencies, the larger willbe the valueofχ2.
Thank you
24 January
2022
148

Contenu connexe

Similaire à Biostatics 8.pptx

Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesisJunaid Ijaz
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statisticsanjaemerry
 
RSS Hypothessis testing
RSS Hypothessis testingRSS Hypothessis testing
RSS Hypothessis testingKaimrc_Rss_Jd
 
estimation.pptx
estimation.pptxestimation.pptx
estimation.pptxNaolAbebe8
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.pptOgunsina1
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inferenceBhavik A Shah
 
statistical inference.pptx
statistical inference.pptxstatistical inference.pptx
statistical inference.pptxsuerie2
 
Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)Syed Muhammad Danish
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with PythonJohnson Ubah
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submitBINCYKMATHEW
 

Similaire à Biostatics 8.pptx (20)

Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesis
 
inferencial statistics
inferencial statisticsinferencial statistics
inferencial statistics
 
Sampling Distribution
Sampling DistributionSampling Distribution
Sampling Distribution
 
RSS Hypothessis testing
RSS Hypothessis testingRSS Hypothessis testing
RSS Hypothessis testing
 
estimation.pptx
estimation.pptxestimation.pptx
estimation.pptx
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.ppt
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
 
Chapter_9.pptx
Chapter_9.pptxChapter_9.pptx
Chapter_9.pptx
 
Chapter09
Chapter09Chapter09
Chapter09
 
statistical inference.pptx
statistical inference.pptxstatistical inference.pptx
statistical inference.pptx
 
Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)Basic statistics for pharmaceutical (Part 1)
Basic statistics for pharmaceutical (Part 1)
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with Python
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submit
 
6. point and interval estimation
6. point and interval estimation6. point and interval estimation
6. point and interval estimation
 

Plus de EyobAlemu11

systemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxsystemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxEyobAlemu11
 
GLYCOSIDES-Introduction & Classification.pdf
GLYCOSIDES-Introduction & Classification.pdfGLYCOSIDES-Introduction & Classification.pdf
GLYCOSIDES-Introduction & Classification.pdfEyobAlemu11
 
1.Introduction to human physiology-1 (1).pptx
1.Introduction to human physiology-1 (1).pptx1.Introduction to human physiology-1 (1).pptx
1.Introduction to human physiology-1 (1).pptxEyobAlemu11
 
Chapter-3 Vitamins.....................pptx
Chapter-3 Vitamins.....................pptxChapter-3 Vitamins.....................pptx
Chapter-3 Vitamins.....................pptxEyobAlemu11
 
Thoracic and abdominal field blocks.pptx
Thoracic and abdominal field blocks.pptxThoracic and abdominal field blocks.pptx
Thoracic and abdominal field blocks.pptxEyobAlemu11
 
BRACHIAL PLEXUS BLOCK 1..............pptx
BRACHIAL PLEXUS BLOCK 1..............pptxBRACHIAL PLEXUS BLOCK 1..............pptx
BRACHIAL PLEXUS BLOCK 1..............pptxEyobAlemu11
 
Approach to respiratory distress in children.pptx
Approach to respiratory distress in children.pptxApproach to respiratory distress in children.pptx
Approach to respiratory distress in children.pptxEyobAlemu11
 
Approach to Childhood Poisoning......pptx
Approach to Childhood Poisoning......pptxApproach to Childhood Poisoning......pptx
Approach to Childhood Poisoning......pptxEyobAlemu11
 
APH for 3rd year HO.................pptx
APH for 3rd year HO.................pptxAPH for 3rd year HO.................pptx
APH for 3rd year HO.................pptxEyobAlemu11
 
accidental injury...................pptx
accidental injury...................pptxaccidental injury...................pptx
accidental injury...................pptxEyobAlemu11
 
RHIS3rd year regular generic HI (1).pptx
RHIS3rd year regular generic HI (1).pptxRHIS3rd year regular generic HI (1).pptx
RHIS3rd year regular generic HI (1).pptxEyobAlemu11
 
1. Loss and Grief.ppt
1. Loss and  Grief.ppt1. Loss and  Grief.ppt
1. Loss and Grief.pptEyobAlemu11
 

Plus de EyobAlemu11 (12)

systemic bacteriology (7)............pptx
systemic bacteriology (7)............pptxsystemic bacteriology (7)............pptx
systemic bacteriology (7)............pptx
 
GLYCOSIDES-Introduction & Classification.pdf
GLYCOSIDES-Introduction & Classification.pdfGLYCOSIDES-Introduction & Classification.pdf
GLYCOSIDES-Introduction & Classification.pdf
 
1.Introduction to human physiology-1 (1).pptx
1.Introduction to human physiology-1 (1).pptx1.Introduction to human physiology-1 (1).pptx
1.Introduction to human physiology-1 (1).pptx
 
Chapter-3 Vitamins.....................pptx
Chapter-3 Vitamins.....................pptxChapter-3 Vitamins.....................pptx
Chapter-3 Vitamins.....................pptx
 
Thoracic and abdominal field blocks.pptx
Thoracic and abdominal field blocks.pptxThoracic and abdominal field blocks.pptx
Thoracic and abdominal field blocks.pptx
 
BRACHIAL PLEXUS BLOCK 1..............pptx
BRACHIAL PLEXUS BLOCK 1..............pptxBRACHIAL PLEXUS BLOCK 1..............pptx
BRACHIAL PLEXUS BLOCK 1..............pptx
 
Approach to respiratory distress in children.pptx
Approach to respiratory distress in children.pptxApproach to respiratory distress in children.pptx
Approach to respiratory distress in children.pptx
 
Approach to Childhood Poisoning......pptx
Approach to Childhood Poisoning......pptxApproach to Childhood Poisoning......pptx
Approach to Childhood Poisoning......pptx
 
APH for 3rd year HO.................pptx
APH for 3rd year HO.................pptxAPH for 3rd year HO.................pptx
APH for 3rd year HO.................pptx
 
accidental injury...................pptx
accidental injury...................pptxaccidental injury...................pptx
accidental injury...................pptx
 
RHIS3rd year regular generic HI (1).pptx
RHIS3rd year regular generic HI (1).pptxRHIS3rd year regular generic HI (1).pptx
RHIS3rd year regular generic HI (1).pptx
 
1. Loss and Grief.ppt
1. Loss and  Grief.ppt1. Loss and  Grief.ppt
1. Loss and Grief.ppt
 

Dernier

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 

Dernier (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 

Biostatics 8.pptx

  • 1. Outline Statistical inference  Samplingdistribution andits properties  Estimation  Hypothesis testing  Paired andindependent sample t-tests  Chi-square test 24 January 2022 1
  • 2. 24 January 2022 2 Objective At the end you should be able  Estimate parameters  Conduct hypothesis testing  Testthe associationsbetween variables
  • 3. Inferential Statistics  It isthe processofgeneralizingor makingconclusionsto the target population basedon the information obtained from the sample 24 January 2022 3
  • 4. Inferential Statistics Notations  Populationvalues(parameters) are denoted usingGreek letters  The sample values (statistic) are denoted byroman letters 4
  • 5. Inferential Statistics process  Howinformation from the sampleislinked to the population? 5
  • 6. Sampling Distributions  The probabilitydistribution ofasamplesstatistic that isformed whenrepeated sampleswere taken from the whole population  Ifwetakemany,manysamplesandget the statistic for eachofthose samples,the distribution ofallthose statistic.  The frequency distribution ofallthese samplesforms the samplingdistribution of these sample statistic. 6
  • 7. 7 Sampling Distributions  Practically repeated samplesdo not taken from the population  Wedo not encounter samplingdistribution empirically, but it isnecessaryto knowtheir properties in order to drawstatistical inferences.  Three thingsthat determine sampling distribution Its mean Its variance Its shape
  • 8. 8 Properties of Sampling Distributions  The mean of the sample means will be the sameasthe population mean.  The standard deviation of the sample means will be equalto the population standard deviation divided bythe squareroot ofthe sample size.  The standard deviation ofthe samplemeanswill be smaller than the standard deviation ofthe population  The Standarddeviationofthesamplingdistributionofthesamplestatisticsiscalled the standarderror
  • 9. 9 Standard deviation vs Standarderror Standard deviation  Isameasureofvariability between individual observations  Descriptiveindex relevant to mean Standard error  Thevariability ofsummary statistics e.g. the variabilityofthe samplemeanor asample proportion  It isameasureofuncertainty in a sample statistic. i.e. precision ofthe estimate of the estimator
  • 10. 10 Sampling Distributions  Basedon the nature ofsummary statistics Sampling distribution of the mean Sampling distribution of the proportion
  • 11. Properties of sampling distribution of the means  The mean of the sampling distribution of the means is the same as the population mean( )  The SDofthe samplingdistribution ofthe meansis / n.  The shape of the sampling distribution of means is approximately a normal curve, regardless of the population distribution when n is large enough (Central limit theorem). 11
  • 12. Properties of sampling distribution of the proportions  The sampleproportion p will be anestimate ofthe population proportions  The SDofthe samplingdistribution ofthe proportion is  The shape of the sampling distribution of proportion is approximately a normal curve, regardless of the population distribution when n is large enough(Central limit theorem). 12
  • 13. 13 Central LimitTheorem  Statesthat regardlessofthe shape ofthe parent population distribution;  Thesamplingdistribution ofanystatisticwill be normal or nearlynormal, ifthe samplesize islarge enough.  Butthe question is" how large enough"? Asarough rule of thumb, Asamplesizeof30 islargeenoughfor continuous data and np≥ 5and nq≥ 5for categorical data whichare measuredby proportion
  • 14. 14 Assumptions of statistical inference  T o make valid inference or conclusions the following assumptions must be satisfied  Samplesmust be randomly selected  Samplesizemust be large enough  The population must be normally or approximately normally distributed ifthe samplesizeislessthan 30. That meansthe population varianceshouldbe known  What if n is not large enough and population variance is unknown?
  • 15. 15 Student’s t- distribution  Weusestudent t distribution in statistical inferencewhichdependson degrees of freedom:  Thet-distribution isatheoretical probability distribution whichissymmetrical, bell-shaped, andsimilarto the normal but more spread out.  Theconditions to usethe student t distribution Thesampleisfrom anormallydistributed population, Populationvarianceisunknown, and Thesample size is small i.e. lessthan30 and np < 5 or nq<5
  • 18. 24 January 2022 18 Student’s t-distributions  The t distribution andstandard normal distribution are similar in : It isbell shaped.  It issymmetrical about the mean. The mean, median, andmode are equal to 0 andlocated at the center. The curve never touches the x axis  The t distribution differs from the normal distribution: The varianceisgreater than one The t distribution isbasedon DF,whichisrelated to sample size. Assamplesizeincrease, the t distribution approachesthe SND(Z).
  • 19. 24 January 2022 19 Parameter Estimations  Wegenerallyassumethe underlyingdistribution ofthe variableofinterest is adequatelydescribed byone or more unknown parameters  Butit isusuallynot possibleto makemeasurements on everyindividualin a population, parameters cannot usuallybedetermined exactly.  Instead we estimate parameters by calculating the corresponding characteristics from arandomsample estimates .
  • 20. 24 January 2022 20 Estimation  It isaprocedure in whichthe information obtained from asampleare used to get the true population parameter.  The processofestimating population parameters byusing samplestatistics  An estimator is any statistic that is used to estimate unknown population parameter.  The valueor valuesthat the estimator assumesare called estimates
  • 21. 24 January 2022 21 Characteristics of good estimator Anestimator shouldbe:  Unbiased: the expected valueofthe estimator must be equalto the parameter to be estimated.  Consistent: asthe samplesizeincrease, the valueofthe estimator should approachesto the valueofthe estimated parameter.  Efficient: thevarianceofthe estimator shouldbe smallest.  Sufficient: the samplefrom whichthe estimator iscalculatedmust contain the maximumpossibleinformation about the population.
  • 22. 24 January 2022 22 Estimation There are two types of estimation: Point estimation and Interval estimation
  • 23. Point Estimation  A single numerical value is used to estimate the corresponding population parameter.  The corresponding point estimator for the parameters: 24 January 2022 23
  • 24. 24 January 2022 24 Point Estimation However, there are pitfalls of point estimation.  Different samples end with different estimate for a single unknown population parameter.  However,point estimate doesnot take sampleto samplevariability into account.  Point estimate does not give the precision of the estimate and hence we need another method ofestimation whichhandlesthese problems.
  • 25. 24 January 2022 25 Interval Estimation  It isaninterval computed from sampledata containing the true population parameter within acertain levelof confidence.  CI=point estimate ± margin oferror (reliability coefficient × StandardError)  CIconsists of three parts: The statistic, Aconfidence level and Standard error  Interval estimators are commonlycalled confidence intervals.
  • 26. 24 January 2022 26 Interval Estimation Level of confidence  Is the probabilityof obtaining the populationparameter within the error margin.  Levelofconfidenceisdenoted as(1-α)100%.  Confidencelevelcannever be 100%!  Mostcommonly the 95%confidenceintervals are calculated  However, 90%and99%confidenceintervals are sometimes used
  • 27. Interval Estimation ACIin general:  Considers variationin samplestatisticsfrom sampleto sample  Basedon observation from one sample  Givesinformation about closenessto unknownpopulation parameters  Statedin terms oflevelof confidence  Interpretation ofconfidenceinterval (e.g. a95% CI) Ifwetake100 repeated n samplesandconstruct confidenceinterval, weexpect that 95 of them will contain the true population parameter. 24 January 2022 27
  • 28. Interval Estimation Thegeneralformula for allCIs is: point estimate (measure of how confident we want to be or reliability coefficient) (standard error) The value of the statistic in my sample (e.g., mean, proportion , mean difference, proportion difference, etc.) From a Z table or a T table, depending on the sampling distribution of the statistic. Standard error of the statistic. 24 January 2022 28
  • 29. 24 January 2022 29 Error of Margin  It is the amount added and subtracted to the point estimate in confidence interval estimation  It isameasure of precision  Error margin isaproduct of Reliability coefficient corresponding to confidence level and Standard error ofthe estimator.
  • 30. 24 January 2022 30 Interval Estimation The width ofthe confidence interval depends on:  Sample size The larger the samplesize, the narrower the confidence interval andthe more preciseour estimate. Because as sample sizeincreasesstandard error decreases. It isto meanthe samplestatistic will approach the population parameter  Standard deviation The more the variation amongthe individualvalues,the wider the confidence interval andthe lessprecisethe estimate.
  • 31. 24 January 2022 31 Interval Estimation  Confidence level Thelarger confidencelevel, the wider the confidence interval 90%CIisnarrower than 95%CIsinceweare only90%certain that the interval includesthe population parameter. The99%CIiswider than 95%CI; the extra width meaningthat wecanbe more certain that the interval willcontain the population parameter.
  • 32. 24 January 2022 32 Interval Estimation Confidenceinterval canbe estimated for  Singlepopulation One population mean One population proportion  Double population Twopopulation(difference) inmean Twopopulation(difference) inproportion
  • 34. CIfor a Single Population Mean  When the followingassumptionsare fulfilled Populationstandard deviation () is known  Population isnormally distributed  Ifpopulation isnot normal, uselarge sample  A100(1-)% C.I. for  iscalculated by:   isto be chosenbythe researcher, most commonvaluesof are 0.05, 0.01 and 0.1. 34
  • 35. Confidence interval  Thepoint estimate ofμ isthe samplemean 𝑥 ҧ  The standard error of𝑥ҧ is 𝛔 ൗ 𝑛  CommonlyusedCLsare 90%, 95%, and 99% 35
  • 36. 36 Example: 1. W aiting times (in hours) at a particular hospital are believed to be approximately normally distributed with a variance of 2.25hr. a. Asampleof20 outpatients revealedameanwaitingtime of1.52 hours. Calculatethe point estimate andconstruct the 95% CI. b. Suppose that the mean of 1.52 hours had resulted from a sample of 32 patients. Calculatethe point estimate andfindthe 95% CI. c. What effect doeslarger samplesizehaveon the CI?
  • 37. Solutions A.  Weare 95%confident that the true meanwaitingtime isbetween 0.87 and2.17 hrs.  Althoughthe true meanmayor maynot be inthis interval, 95%ofthe intervalsformed in this manner willcontain the true mean.  Anincorrect interpretation isthat there is95%probability that this interval containsthe true population mean. 20 1.52.65(.87,2.17) 37 1.521.96 2.25 1.521.96(.33)
  • 38. Solutions B. 32 38  1.52 .53  (.99, 2.05)  Thelarger the samplesizemakes the CI narrower (more precision).  When constructing CIs, it hasbeen assumedthat the standard deviation ofthe underlying population,  , isknown  What if  isnot known? 1.52 1.96 2.25  1.52 1.96(.27)
  • 39. Unknown variance (small sample size, n ≤ 30)  Ifthe  for the underlying populationisunknownandthe samplesize is small  Asanalternative weuseStudent’ st distribution. 39
  • 40. Degrees of Freedom (df)  df= Number ofobservations that are allowedto varyfreelyafter the estimator hadcalculated. df= n-1 40
  • 41. Example  Compute a 95% CI for the mean birth weight based on n = 10, sample mean = 116.9 Oz ands =21.70. From the t table, t (9, 0.975) = 2.262 Answer:(101.4, 132.4) Interpretations?
  • 42. CIs for single population proportion, p  An interval estimate for the population proportion (π) canbe calculated byaddinganallowancefor uncertainty to the sample proportion (p)  Isbasedon three elements of CI. Point estimate SEof point estimate Reliability coefficient
  • 43. CIs for single population proportion, p
  • 44. CIs for single population proportion, p
  • 45. Example  A random sample of 100 people shows that 25 are left-handed. Calculate the point estimate and form a 95% CI for the true proportion of left-handers.
  • 46. Example  It was found that 28.1% of 153 cervical-cancer cases had never had a Pap smear prior to the time of case’s diagnosis. Calculate a 95% CI for the percentage of cervical-cancer cases who never hadaPap smear.
  • 47. Estimation for two Populations
  • 48. CIfor the difference between population means  Known variances and large sample size  When 1 and2 are knownandboth populationsare normal or both samplesizesare at least 30  Thetest statistic isa z-value  The point estimation of (μ1- μ2) is(𝑥 1 ҧ − 𝑥 2 ҧ )  Thestandard error is ( 𝑥 1ҧ − 𝑥ҧ2 )  Finally,  Ifpopulation variancesare unknown, theycanbe approximatedbythe samplevariances:𝑠1 2 and𝑠2 2 whenthe Sample islarge (n≥ 30)
  • 49. Example 1 • Researchers wishto knowifthe data they havecollected provide sufficient evidence to indicate adifference in mean serum uric acidlevels between normal individualsandindividualswith mongolism.The data consist of serum uric acidreadings on 12 mongoloid individualsand15 normal individuals.The meansare 𝑥ҧ1= 4.5 mg/100 ml and𝑥ҧ2= 3.4 mg/100 m l .The data constitute two independent simple random samples each drawn from anormally distributed population with avarianceequal to 1 mg/100 ml. • Compute the point estimate andconstruct a95%CIfor the difference in meanserum uric acidlevels between the two populations.
  • 51. Example 2  Researchers are interested in the difference between serum uric acid levels in patients with and without Down’ssyndrome.  Patientswithout Down’s syndrome n=12, samplemean=4.5 mg/100ml,2=1.0  Patientswith Down’s syndrome n=15, samplemean=3.4 mg/100ml,2=1.5  Calculate the 95% CI. SE= 0.43, 95% CI = 1.1 ± 1.96 (0.43) = (0.26, 1.94)  Weare 95%confident that the true differencebetween the two population meansis between 0.26 and 1.94.
  • 52. CIfor the difference between population means UnknownVariances (σ1 2and σ2 2) and small sample size (n < 30)  Ifthe followingassumptions satisfied The two random samplesare independent Bothsamplesare pickedfrom population with normal distribution. The population variancesare unknownbut are assumedto be equal.  the test statistic isat-value with degrees offreedom = 𝑛1 + 𝑛2-2  The point estimation of(μ1- μ2) is (𝑥1 ҧ− 𝑥ҧ2)  The standard error is (𝑥1 ҧ− 𝑥 2 ҧ )=
  • 53. CIfor the difference between population means  Thepooled samplevariance (S2)  Finally,(1- α) 100% confidence interval for (μ1- μ2):
  • 54. Example  Aresearch team collected serum amylasedata from asampleofhealthy subjects andfrom asampleofhospitalized subjects.They wishto knowif they wouldbe justified in concluding that the population meansare different.The data consist ofserum amylasedeterminations on 𝑛2=15 healthy subjects and 𝑛1=22 hospitalized subjects.The samplemeansand standard deviations are as follows: 𝑥ҧ1= 120 units/ml, 𝑠1=40 units/ml 𝑥ҧ2= 96 units/ml, 𝑠2=35 units/ml  Construct a95%CIfor the difference between the two population mean serum amylase.
  • 55. Example  Calculate the pooled variance S2  Calculate the 95%confidence interval  95%CI
  • 56. CIfor the difference between populationproportions  Supposethat n1andn2are largeenoughso that; – 𝑛1𝑝1≥5,𝑛1(1 − 𝑝1)≥5,𝑛2𝑝2≥5,and 𝑛1(1 − 𝑝1)≥5  Thepoint estimate for the differenceoftwo population proportion, 𝜋1− 𝜋2isby𝑃1− 𝑃2. 1 2 𝑃1(1−𝑃1) + 𝑃2(1−𝑃2) 𝑛1 𝑛2  The standard deviation 𝑃 − 𝑃=  A(1-α)100% confidenceinterval estimate for the differenceofpopulation proportions, 𝑃1− 𝑃2= 𝑃1− 𝑃2± 𝑧𝛼 Τ2 𝑛 + 𝑃1(1−𝑃1) 𝑃2(1−𝑃2) 𝑛 1 2
  • 57. Example  Each of two groups consists of 100 patients who have leukemia. Anew drug is given to the first group but not to the second (the control group). It is found that in the first group 75 people have remission for 2 years; but only 60 in the second group. Find 95% confidence limits for the difference in the proportion of all patients with leukemia who haveremissionfor 2 years.
  • 58. Example  𝑝1= 0.75, 𝑞1= 0.25, 𝑛1=100; 𝑝2= 0.60, 𝑞2= 0.40, 𝑛2=100  𝑛1𝑝1=75>5 and 𝑛1𝑞1= 25>5  𝑛2𝑝2=60>5 and𝑛2𝑝2=40>5  𝜎1 2 = 1 1 = 0.001875 and𝜎2 2 𝑝 𝑞 𝑝 𝑞 2 2 𝑛1 𝑛2 = = 0.0024  Hence, the 𝜎2for 𝑝1− 𝑝2= 0.001875+ 0.0024 = 0.004275  𝜎 for 𝑝1 − 𝑝2= 0.004275 = 0.0653  At a 95% Confidence level, Z = ± 1.96; 𝑝2− 𝑝1= 0.75 - 0.60 = 0.15  Therefore, 95 %C.I. =(0.15±1.96(0.065))= (0.15 ± 0.13)=(0.02,0.28).
  • 59. Summary Is σ known? Is n ≥ 30 or np and nq≥5 Use tα/2 values and s in the formula. ye s ye s Use zα/2 values no maters what the sample size is Use zα/2 values and s in place of σ in the formula. N o N o • When to usetα/2 or zα/2 for findingconfidenceinterval
  • 61. HypothesisTesting  Researchers are interested to conduct a study for answering many research questions/hypothesis.  The best wayto determine whether their hypothesisistrue wouldbe to examine the entire population.  Butit isoften impractical, researchers typicallyexamine arandomsamplefrom the population.  The purpose ofthe anystudy isto collect datawhichwill allowthe researcher to test the hypothesisor answertheir question.  Statistical tests canprove(with acertain degree ofconfidence), that ahypothesis are true or not.
  • 62. HypothesisTesting  Inhypothesistesting:-the researcher must definethe population under study, -state the particular hypothesisthat will be investigated, -Determine significance level, -select asamplefrom the population and collect the data, and -perform the appropriate statisticaltest andreacha conclusion.
  • 63.
  • 64. Hypothesis Testing  Hypothesis is a testable statement that describes the nature proposed relationship between two or more variablesof interest.  Hypothesisare formulated, experiments are performed, andresults are evaluated for their consistency with a hypothesis.  HypothesisTesting(HT) providesanobjectiveframework for makingdecisions usingprobabilistic methods  The purpose ofHTisto aidthe clinician, researcher or administrator in reaching adecision (conclusion).
  • 65. Types of Hypothesis  The Null Hypothesis, H0  Isastatement claimingthat there isno difference between the hypothesizedvalue andthe population value(parameter= hypothesized value)  It isastatement ofagreement (no difference)(no difference between groupsor the intervention isnot effective)  Statesthe assumption (hypothesis) to be tested  It isalwaysabout apopulation parameter (mean, proportion, OR, RR, etc.), not about asample statistic  Alwayscontains“=” , “ ≤” or“≥ ” sign  Mayor maynot be rejected
  • 66. Types of Hypothesis TheAlternative Hypothesis,HA  It isastatement wewillbelieveastrue ifwereject the H0.  It isgenerally the hypothesisthat isbelieved(or needsto be supported) bythe researcher.  Is a statement that disagrees (opposes) with H0 (there is difference between groupsor the intervention effective)  Never contains“=” , “ ≤” or “≥ ” sign,it contains“≠”,“>”, or”<“  May or maynot beaccepted
  • 67. Rules for Stating Statistical Hypotheses  Indicationof equality(either =, ≤ or ≥) mustappearinH0. H0 : μ = μo, HA: μ ≠ μo; when our hypothesis is expressed in terms of population mean H0: P= Po, HA: P≠ Po; when our hypothesisisexpressed interms ofpopulationproportion  Canweconcludethat acertain populationmean is  not 50?;H0: μ = 50 andHA: μ ≠50  greater than 50?; H0: μ ≤ 50 andHA: μ > 50  Canweconcludethat the proportion ofpatients with leukemiawhosurvivemore than six years isnot 60%? HA: P= 0.6 and HA: P≠0.6  Canweconcludedissmokingissignificantlyassociatedwith lungcancer H0: there isno associationbetween smokingandlung cancer. HA:there isanassociationbetween smokingandlung cancer
  • 68. Hypothesis testing process  Nowthink about howthe hypothesistest shouldbe carried out  Wedrawarandom sampleofsizenfrom the underlying population and calculateits samplemean (𝑥ҧ)  Wecompare(𝑥ҧ)to the postulated mean μ0  Is the difference between (𝑥ҧ) and μ0 too large to be attributed to chance alone?
  • 70. Steps in HypothesisTesting 1. Formulatethe appropriate statisticalhypotheses clearly SpecifyH0and HA H0:  = 0 H0:  ≤0 H0:  ≥0 HA:   0 HA:  > 0 HA:  < 0 two-tailed one-tailed one-tailed 2. Decide on the appropriate test statistic for the hypothesis. E.g., one population or
  • 71. Steps in HypothesisTesting 3. Specifythe desired levelofsignificance(=0.05, 0.01, etc.) 4. Determine the critical value. 5. Compute the test statistic or the p-value 6. Reachadecisionanddrawthe conclusion  IfH0isrejected,weconcludethatHAistrue(oraccepted).  IfH0isnotrejected,weconcludethatHomaybetrue.
  • 72. One tail and two tailtests  Depend on the waythe H0iswritten, hypothesistesting canbe:  Twotail test  Therejection region issplit into the two tails.  Alternative hypothesistakestheform ”differentfrom”.  One tail test  Therejection region isat one end ofthe distribution or the other.  Alternative hypothesistakesthe form ”lessthan”or ”greater than”.
  • 73. Level of Significance, α  Isthe probabilityofrejecting atrue H0  Definesrejection region ofthe sampling distribution  The decisionismadeon the basisofthe levelofsignificance,designated byα.  More frequently used valuesofα are 0.01, 0.05 and 0.10.  α isselected bythe researcher at the beginning
  • 74. Test statistic  Anyobserveddifferences or associationsmayhaveoccurred bychance.  Becausethere israndomvariation, evenanunbiasedsamplemaynot accurately represent the population asa whole.  Atest statisticsisavaluewecancompare with knowndistribution ofwhatwe expect when the null hypothesisis true.  The general formula of any test statisticsis: 𝒐 𝒃 𝒔 𝒆 𝒓 𝒃 𝒆 𝒅𝒗 𝒂 𝒍 𝒖 𝒆 − 𝒉 𝒚 𝒑 𝒐 𝒕 𝒆 𝒔 𝒊 𝒛 𝒆 𝒅 𝒗 𝒂 𝒍 𝒖 𝒓 𝒔 𝒕 𝒂 𝒏 𝒅 𝒂 𝒓 𝒅𝒆 𝒓 𝒓 𝒐 𝒓  Anexampleofatest statistic isz-test , t-test, X2-test
  • 75. Critical value  The valuethat separates the rejection region from the acceptance region for a givenlevelof significance  The valuesofthe test statistic assumethe points on the horizontal axisofthe normal distribution andseparatestwo regions:  Rejection region, and Non-rejection region.  Thevaluesofthe test statistic forming the rejection region are lesslikelyto occur ifthe H0is true.  Thevaluesmakingthe acceptance(non-rejection) region are more likelyto occur ifthe H0 is true.
  • 76. Rejection and Non-Rejection Regions Rejection region Non-rejection region Rejection region = 0.025 = 0.025 0.95 1.96 -1.96
  • 77. P-value  Inmost applications, the outcome ofperforming ahypothesistest isto produce a p-value.  P-valueisthe probabilityofobtainingatest statistic asextreme or more extreme valuethan the actual test statisticobtained if the H0 is true • P-valueisthe probabilitythat the observeddifference isdue to chance.  The larger the test statistic, the smaller is the P -value, the value observed occurring just bychanceis low.  The smaller the P-valuethe stronger the evidencefor rejecting H0 . Reject H0 ifP-value< α AcceptH0 ifP-value> α What ifP-value =α??????
  • 78. How to calculateP-value  Usestatistical software likeSPSS, SAS,STA TA, or R, etc.  Manual calculations Obtained from the test statistics (Z calculated) Findthe probability oftest statistics from standard normal table Subtract the probability from 0.5 Ifthe test two tailed multiply 2 the result.
  • 79. Statistical Decision  Basedon the computation from the data ofthe sample  The decision to reject or not to reject the Ho isbased on The magnitude ofthe test statistic. CI P-value  Reject Ho ifthe valueofthe test statistic in the rejection region  Don’t reject Ho ifthe computed valueofthe test statistic isone ofthe valuesin the non-rejection region.
  • 80. Errors in hypothesis testing  Whenever wereject or accept the H0 wecommit errors.  Twotypes oferrors are committed. TypeI Error TypeII Error
  • 81. TypeI Error  Theerror committed whenatrue H0is rejected  Considered aserious type of error  The probability ofatype Ierror isthe probabilityofrejecting the H0 whenit is true  The probability oftype Ierror isα, Called levelofsignificanceofthe test  Setbyresearcher in advance
  • 82. TypeII Error  Theerror committed whena false H0 isnot rejected  The probability ofTypeIIError is   Usuallyunknownbut larger than α
  • 83. Power  Theprobability ofrejecting the H0 whenit is false.  Power= 1 – β = 1- probability oftype IIerror  Wewouldliketo maintainlowprobability ofatype Ierror (α) and low probability ofatype IIerror (β) [highpower = 1 - β].
  • 84. Summary Decision (Conclusion) Reality H0 True H0 False Do not reject Ho Correct action (Prob. = 1-α) Type II error (β) (Prob. = β= 1-Power) Reject Ho Type I error (α) (Prob. = α = Sign. level) Correct action (Prob. = Power = 1-β)
  • 85. Summary Example HO =there isno pregnancy;HA= there ispregnancy
  • 86. TypeI &II Error Relationship
  • 88. Factors Affecting the Power of theTest  The power depends on: 1. Asn↑, power↑ 2. As|µ1-µo|↑, power↑ 3. As↑, power↓ 4. Asα↓, power↓
  • 89. Hypothesis Test for OneSample  Test for single mean  Test for single proportion
  • 90. HypothesisTesting of a Single Mean (Normally Distributed)
  • 91. HypothesisTesting for KnownVariance  Twotailed test H 0 : 0 H A : 1  0   n  z for two tailed test 2 cal tab if | zcal | ztab do not reject H o if | z | z reject H o ztabulated cal Decision : z   x  0
  • 92. Example  Asimplerandomsampleof10 peoplefrom a certain population hasa meanage of 27. Canwe The variance is known to be conclude that the mean age of the population is not 30? 20. Let α = .05. Data n = 10, sample mean = 27, 2 = 20, α = 0.05 Assumptions Simplerandom sample Normallydistributed population
  • 93. Example A. Hypothesis Ho: µ= 30 HA: µ≠ 30 B.Test statistic Asthe population varianceisknown, weuseZ asthe test statistic C. Determine the levelof significance
  • 94. Example D. Determine the criticalvalue  Reject Ho ifthe Z valuefallsinthe rejection region.  Don’t rejectHo if theZ valuefallsin the non-rejection region.  Becauseofthe structure ofHo it isatwo tail test. Therefore, reject Ho ifZ ≤ -1.96 or Z ≥ 1.96.
  • 95. Example E.Calculation of test statistic or computeCI F .Statistical decision Wereject the HobecauseZ = -2.12 isinthe rejection region.Atα of 5%. Conclusion Weconcludethat µisnot 30. P-value= 0.0340 AZ value of -2.12 correspondsto an area of0.0170. Sincethere are two parts to the rejection region in a two tail test,the P-value is twice this which is .0340.   2 . 1 2 2 7  3 0   3 1 0 2 0 1 . 4 1 4 2 x  0  z  n c a l
  • 96. Hypothesis testing using confidenceinterval  Aproblem like the above example can also be solved using aconfidence interval.  A confidence interval will show that the calculated value of Z does not fall within the boundaries ofthe interval. However,it willnot givea probability.  Confidence interval  27 1.96(1.4142)  (24.228,29.772) n CI  x  z   2
  • 97. HypothesisTesting for KnownVariance  One tailed test 𝐻0 ∶ 𝜇≥ 𝜇𝑂 𝐻𝐴 ∶ 𝜇< 𝜇𝑂
  • 98. Example  A simple random sample of 10 people from a certain population has a mean age of 27. Can we conclude that the mean age of the population is less than 30? The variance is known to be 20. Let α = 0.05.  Data n = 10, sample mean = 27, 2 = 20, α = 0.05  Hypotheses Ho: µ ≥ 30, HA: µ < 30
  • 99. Example  Test statistic e have the entire rejection region at the left. The critical value will be Z  With α = 0.05 and the inequality, w = -1.645. Reject Ho if Z < - 1.645. =  Rejection Region Lower tail test
  • 100. Example • Statistical decision – We reject the Ho because -2.12 < -1.645. • Conclusion – We conclude that µ < 30. – p = .0170 this time because it is only a one tail test and not a two tail test.
  • 101. Hypothesis testing for unknown variance (nsmall)  Inmost practical applicationsthe standard deviationofthe underlying population isnot known  Inthis case,  canbe estimated bythe samplestandard deviation s.  Ifthe underlying population isnormallydistributed, then the test statistic is:
  • 102. Example  Asimplerandom sampleof14 people from acertain population givesasamplemeanbody mass index (BMI)of30.5 andSDof10.64. Canweconcludethat the BMIisnot 35 at α 5%?  Ho: µ= 35, HA:µ≠35  Test statistic  Ifthe assumptionsare correct andHo istrue, the test statistic followsStudent's t distribution with 13 degrees of freedom.
  • 103. Example  Decision rule  Wehaveatwo tailed test. With α = 0.05 it meansthat eachtail is0.025.The critical t valueswith 13 dfare -2.1604 and 2.1604.  Wereject Ho ifthe t ≤ -2.1604 or t ≥ 2.1604.  Dono possib in the rejection 5 t reject Ho because-1.58 isnot le that µ= 35. P-value= 0.137 region. Basedon the dataoft hesample, it is
  • 104. Hypothesis testing for proportions  Involvescategorical values  Twopossibleoutcomes  “Success”(possessesacertaincharacteristic)  “Failure”(doesnot possessesthatcharacteristic)  Fractionor proportion of populationin the“success”categoryis denoted by p
  • 105. Hypothesis testing for proportions t−test atn −1 df
  • 106. Example  We are interested in the probability of developing asthma over a given one-year period for children 0 to 4 years of age whose mothers smoke in the home. In the general population of 0 to 4-year-olds, the annual incidence ofasthma is1.4%. If10 casesofasthmaare observedover a single year in a sample of 500 children whose mothers smoke, can we conclude that this is different from the underlyingprobability ofp0= 0.014?Α = 5% H0 : p = 0.014 HA: p ≠ 0.014
  • 107. Example • The test statistic is given by:
  • 108. Example  Thecritical valueofZα/2 at α=5% is±1.96.  Don’t rejectHosinceZ(=1.14) in the non-rejection region between ±1.96.  P-value = 0.2548  We do not have sufficient evidence to conclude that the probability of developing asthma for children whose mothers smoke in the home is different from the probability in the general population
  • 109. Hypothesis testing for two samples  ComparingTwo Population Means;  Independent samples: variancesknown  Independent samples: variancesunknown • Paired Difference Experiments  Paired/matched/repeated sampling • ComparingTwo Population Proportions  Large,independent samplescase
  • 110. Hypothesis testing for two populationmeans  Independent sample with known variance or both groups have large sample size Thesteps to test the hypothesisfor differenceofmeansisthe samewith the singlemean Step1: state the hypothesis Ho: µ1-µ2 =0 vsHA: µ1-µ2≠0, HA: µ1-µ2<0, HA: µ1-µ2 >0 Step2: Significancelevel(α) Step3:Test statistic n1 n2 2 2 1    2  ( x  y )  (1 2) zc al
  • 111. Hypothesis testing for two populationmeans   if zcal  ztab cal  ztab     A cal do not reject Ho reject Ho do not reject Ho if zcal  zcal if z  zt a b reject Ho : 1  2 0 A cal cal if | zcal | zcal do not reject Ho if | z | reject Ho A ztabulated ztabulated if z : 1  2 0 For H For H For H 2 1 :     0  z for two tailed test 2  z for one tailed test  z
  • 112. Example • Aresearchers wishto knowifthe datathey havecollected providesufficientevidenceto indicate a difference in mean serum uric acid levels between normal individual and individual with down’s syndrome. The data consists of serum uric acid readings on 12 individuals with down’s syndrome and 15 normal individuals. The means are 4.5mg/100ml and 3.4 mg/100ml with population standard deviationof2.9 and3.5 mg/100ml respectively. HO : 1  2  0 H A : 1  2  0
  • 113. SOLUTION  The  1 . 9 6 1 .2 3 1 .5 1 7 8 1 2 1 5 1 . 6 2 . 9 2 3 . 5 2  1 . 6 5 . 3 3  z0 . 0 2 5 z  2 1    2 n1 n 2 2  2   z ca l   ( x  y )  ( 1   2 )  ( 4 . 3  3 . 4 )  0
  • 114. 114 Hypothesis testing for two populationmeans Independent Samples,variancesunknown  Generally, in most ofthe real lifesituations, the true valuesofthe population variances 𝜎1 2 and 𝜎2 2are notknown.  Theyhave to be estimated from samplevariance 𝑆1 2 and 𝑆2 2 ,respectively.  Alsoneed to estimate the standard deviation ofthe samplingdistributions ofthe differencein means ( 𝑋 ത 1 - 𝑋 ത 2 )  Twoapproach's 1.The varianceofthe two populationsare assumedto be equal 2.The varianceofthe two groupsare assumedto be not equal
  • 115. Hypothesis testing for two populationmeans  Assumed that the unknownvariances are equal; 𝝈𝟏 𝟐=𝝈𝟐 𝟐=𝝈𝟐  Thepooled estimate of𝜎2isthe weightedaverageofthe two sample variances,𝑆1 2 𝑎𝑛𝑑𝑆2 2  Thepooled estimate ofisdenoted by𝑆𝑝 2  Standarddeviationofthe samplingdistribution is; 𝑠 𝑥 1 ҧ−𝑥2ҧ = 𝑝 𝑛1 𝑛2 𝑆 2 ( 1 + 1 ) 115
  • 116. Hypothesis testing for two populationmeans  The t-statistic will be used 𝑡= 𝑥ҧ1−𝑥ҧ2 −(𝜇 1−𝜇2) 116 𝑆𝑝 2( 1 + 1 ) 𝑛1 𝑛2  The df = 𝑛1+ 𝑛2− 2
  • 117. Hypothesis testing for two populationmeans 𝑠𝑥ҧ1−𝑥ҧ2 = 𝑛 Assumethat the unknownvariances are not equal;𝝈𝟏 𝟐 ≠ 𝝈𝟐 𝟐  The 𝜎1 2 and 𝜎2 2will be estimated by𝑆1 2 𝑎𝑛𝑑𝑆2 2 , respectively  Standarddeviationofthe samplingdistribution is; 𝑆1 2 𝑆2 2 117 1 2 ( + ) 𝑛  Howto compute the dfwhenthe unknownvariancesandassumednot to be equal?(reading assignment)
  • 118. Example  We have 20 subjects, all males between the ages 25 and 35 who volunteer for our experiment. One half of the group will be given coffee containing caffeine; the other half will be given decaffeinated coffee as the placebo control. We measure the pulse rate after the subjects drink their coffee.The results are: A) Testthe hypothesis that caffeinehasno effect on the pulse rates ofyoungmen byassuming both groups hadequalvariance?(α = .05) B) Findthe 95%C.I. for the population mean difference. 118
  • 119. 119 SOLUTION  Hypotheses:Ho : μt = μc HA: μt ≠μc  where, μt = population meanoftreatment group, μc = population meanofcontrol (placebo) group. Compute the pooled(combined) varianceofboth groups S2= { (10-1)x 28.67+ (10-1) x 31.11 } / 18 = (258.03 + 279.99)/18 = 538.02 / 18 = 29.89 Therefore,t calc = (75 - 68) / √ 29.89(1/10 + 1/10 ) = 7/ √ 5.978 = 2.86 (Thiscorresponds to aP-valueoflessthan 0.02) t tab ( α = 0.05 , df = 18 ) = 2.10, t calc> t tab ⇒rejectHo • Hence, caffeinatedcoffeehasaneffect on the pulserates ofyoung men.
  • 120. 120 Hypothesis testing for two population means Dependent/paired/matched/repeated sampling  Risesfrom two differentprocesseson same studyunits (e.g. "before” and“after” treatments)  Use of the same/matched individuals, eliminates any differences in the individualsthemselves(confounding factors).  Inference concerning the differencebetween two population meansissimilarto one population mean; except that wemanipulateon the difference here.
  • 121. Hypothesis testing for two population means 121
  • 122. Hypothesis testing for two population means  Ifthe populationofdifferencesisnormally distributed with mean𝜇𝑑  Thetest statistic =𝑑ത−𝑑𝑂 𝑆𝑑 ൗ 𝑛  Thetest statistic= Z-test ifthe samplesizeislarge(n1&n2>30) or varianceis known.  The test statistic= t-test ifthe sample size is not largeenoughand unknown variance  A(1-α)100% confidence interval for µd= µ1- µ2is: ҧ 𝑑±𝑧 Τ 𝑎 2 𝑎 2 𝑜 𝑟𝑡( Τ, df) ൗ 𝑆𝑑 𝑛 122
  • 123. 123 Example  SerumCholesterol Levelsfor 12 SubjectsBeforeandAfter Diet-Exercise Program Subject Serum Cholesterol Difference (after-before) Before (x1) After (x2) 1 201 200 -1 2 231 236 +5 3 221 216 -5 4 260 233 -27 5 228 224 -4 6 237 216 -21 7 326 296 -30 8 235 195 -40 9 240 207 -33 10 267 247 -20 11 284 210 -74 12 201 209 +8
  • 124. Solution  1 5 ...  8   242  20.17 n 12 12 d   di  1 2 11 124 1 2 1 0 7 6 6    2 4 2 2 2  5 3 5 . 0 6 i i i d s 2 n d 2  d   n  1 n n 1   d  d 2    1. State the hypothesis Ho: The mean difference between before and after diet- exercise- program is  zero HA: The mean difference between before and after diet- exercise- program is < zero
  • 125. Solution 2. Select the appropriate test statistic 3. Select the level of significance = 0.05 4. Determine the critical ratio or critical value of t test = - 1.7959 5. Perform the calculation for the test statistic t   20.17  0   20.17  3.02 • Reject Ho since - 3.02 < - 1.7959 • Conclude that the diet-exercise program is effective. 535.06 12 6. Draw and state the conclusion 6.68 125
  • 126. Hypothesis testing for two populationproportions  Supposethat n1andn2are largeenoughso that; – 𝑛1𝑝1≥5,𝑛1(1 − 𝑝1)≥5,𝑛2𝑝2≥5,and 𝑛1(1 − 𝑝1)≥5 1 2 𝑃1(1−𝑃1) + 𝑃2(1−𝑃2) 𝑛1 𝑛2  The standard deviation 𝑃 − 𝑃=  Thetest statistic could be 𝑍 𝑐 𝑎 𝑙= (𝑃1−𝑃2)−(𝜋1−𝜋2) + 𝑃1(1−𝑃1) 𝑃2(1−𝑃2) 126 𝑛1 𝑛2  What if the sample size issmall?  weuse t-statistic with df of 𝑛1+ 𝑛2− 2
  • 127. 127 Example  Aresearcher is trying to study the malaria situation of Ethiopia. From the records of seasonal blood survey (SBS) results he came to understand that the proportion of people having malaria in Ethiopia was 3.8% in 2019 (Eth. Cal). The size of the sample considered was 15000. He also realized that during the year that followed (2020), blood samples were taken from 10,000 randomly selected persons. The result of the 2020 seasonal blood survey showed that 200 persons were positivefor malaria.  Doesthe researcher concludethat the malariasituationof2020 did not showanysignificant differencefrom that of2021 (take the levelofsignificance,α =.01).
  • 128. Solution HO : P2019= P2020( or P2019- P2020= 0 ); HA: P2019≠ P2020( or P2019- P2020≠ 0 ) P2019= 0.038 , P2019= 15,000  P2020= 0.02 , P2020= 10,000  Z tab ( α = 0.01 ) = 2.58 (two tail) 1 5 , 0 0 0 1 0 , 0 0 0  Zcalc= 8.2,Which corresponds to aP-valueoflessthan 0.003.  Decision: reject Ho (because Zcal > Z tab); in other words, the p-value is less than the level of significance (i.e., α = 0.01) 128 0.038(1  0.038)  0 . 0 2 ( 1  0 . 0 2 ) ( 0 . 0 3 8  0 . 0 2 )  (0)  zc a l
  • 129. 129 Example  Astudy wasconducted to look at the effects of oral contraceptives (OC) on heart disease in women 40–44 years of age. It is found that among n1 = 500 current OC users, 13 develop a myocardial infarction (MI) over a three-year period, while among n2 = 1000 non- OC users, seven develop a MI over a three-year period. Then can you conclude that rate of MIis significantly greater among OC users?(Report the P-valuefor your test)
  • 130. 130 Measures of Association  While a test of hypothesis can be used to determine whether an association exists between two random variables, it cannot provide a measureofthe strength ofthe association • Several methods are available for estimating the magnitude of the effect giventhe categoricaldatain a2× 2 contingency table 1. Chi-SquareTest 2. Relative Risk (RR) 3. Odds Ratio (OR)
  • 131. 131 Chi-SquareTest  AChi-Square (χ2) is a probability distribution used to make statistical inferences about categorical data (proportions) in which the numbers ofcategories are two or more.  Widelyusedin the analysisofcontingency tables.  Chi-Square test allows us to test for association between two categorical variables.  Ho: No associationbetween the variables;HA:There is association  Consequently asignificantp-valueimplies association.
  • 132. X2 Distribution  Indexedbythe degrees offreedom (n)  Unlike z and t distributions, which are always symmetric about 0, the X2distribution only takes on positive values and is alwaysskewed to the right.  The skewnessdiminishesasn increases 18.307 2 1 0 0.05 A cceptance region 0,95 R ejection region 132
  • 133. 133 X2 Distribution  Ast distributions, there isadifferent X2distribution for eachpossiblevalueof df.  X2distributions with asmallnumber ofdfare highlyskewed;however,this skewnessisattenuated asthe number ofdfincreases.  The dfdistribution isconcentrated overnonnegative values.  It hasmeanequalto its degrees offreedom (df), andits standard deviation equals √(2df ).  Asdfincreases, the distribution concentrates around larger valuesandismore spread out.  The distribution isskewedto the right, but it becomesmore bell-shaped (normal) asdf increases
  • 134. X2 Distribution  Asdfincreasesit becomesmore bell-shaped (normal) 134
  • 135. Chi-Square test  It isastatistic whichmeasuresthe discrepancybetween kobservedfrequencies O1, O2,…Ok andthe corresponding expected frequencies E1, E2,… Ek.  Ifthe valueofχ2 iszero, no discrepancybetween the observedandthe expected frequencies.  The greater the discrepancy,the larger willbe the valueof χ2.  The calculatedvalueofχ2 iscompared with the tabulated value for the givendf. • Chi-Squaretest isbasedon the table ofΧ2 for df. ForRx Ctable the dfisgiven by: (row-1)(columon-1) or (R-1)(C-1) 135
  • 136. Chi-SquareTest  Counts in the Chi-SquareTestofa2x2 tablearerepresentedas“a”, “b”, “c” and“d”.  Thegeneral formula for 2x 2 table. nadbc2  We canalso use 2 (ac)(bd)(ab)(cd) 136 
  • 137. Chi-SquareTest ExpectedValue  Isthe product ofthe row total multiplied bythe column total, divided bythe grand total  The expected numbers must be computed for each cell. 137
  • 138. Chi-SquareTest  Assumptions  Datamust be categorical  The data shouldbe afrequency data  Thenumbersin eachcell are‘not too small’. No expected frequency = zero  No more than 20% of the expected frequenciesshouldbe lessthan 5.  Ifthis does not hold  combined(re-categorized) row or columnvariablescategories to makethe expected frequencieslarger or  useYatescontinuity correction 138
  • 139. 139 Example  A study was conducted to investigate the possible cause of gastroenteritis outbreak following a lunch served in a high school cafeteria. Among the 225 students who ate the sandwiches, 109 became ill. While, among the 38 students who did not eat the sandwiches,4 became ill.  Present the data by2x2 contingency table
  • 140. Example  With this method, dataare arranged in the form ofacontingency table  Thisisa2 × 2 table for two dichotomous random variables 140
  • 141. Solution  We again wish to know whether the proportions of students who becameill in eachofthe groupsare identical  Tocarry out the test, wefirst calculate the expected counts for the table assuming that: H0: p1 = p2 HA: p1 ≠p2 141
  • 142. Example  The chi-square test compares the observed frequencies in each categorywith the expected frequencies giventhat H0is true  Are the deviations between Observed and Expected too large to be attributed to chance?  Todetermine this, deviationsfrom all4 cellsmust be combined  Calculate the sum: 142
  • 143. 143 Example  TheHo isrejected at α levelifX2istoo large, in particular, ifX2> X21,α  If α = 0.05, wewouldreject H0for X2greater than X21,α = 3.84  Therefore, wereject the Ho  The p-valueisgivenbythe area under the X2distribution to the right of X2  P-value< 0.001
  • 144. 144 Example  Astudy was conducted to look at the effects of oral contraceptives (OC) on heart disease in women 40 to 44 years of age. It is found that among 5000 current OC users at baseline, 13 women develop a myocardial infarction (MI) over a 3-year period, where as among 10,000 non-OC users, 7 develop an MI over a 3-year period. Compare the relation between Chi-SquareTestandz-test ? – P1= 0.0026, P2= 0.0007 – Z-test = 2.77, P-value= 0.006 – There isahighlysignificantassociationbetweenMIandOC use
  • 145. 145 Solution  Displaythe abovedatain the form ofa2x2 contingency table OC-use group MI statusover 3 years Total Yes No OC users 13 4987 5000 Non-OC users 7 9993 10,000 Total 20 14,980 15,000 Isthe proportion ofMIthe samein OC users andnon-OC users? What canbe saidabout the relationship between MIstatus andOC use?
  • 146. Solutions  Compute the expected frequencyfor the OC-MI data  Relationship betwe X2 ≈ 8, 0.001<p-value < 0.005 en X2andZtest isX2= Z2 146
  • 147. 147 Summary 1. Everyχ2 distribution extends indefinitely to the right from 0. 2. Everyχ2 distribution hasonlyone (right ) tail. 3. Asdfincreases, the χ2 curves get more bell shapedandapproach the normal curve in appearance (but remember that a chi square curve starts at 0, not at - ∞) 4. If the value of χ2 is zero, then there is aperfect agreement between the observed and the expected frequencies. The greater the discrepancy between the observedandexpected frequencies, the larger willbe the valueofχ2.