SlideShare une entreprise Scribd logo
1  sur  110
Two-sample tests
Binary or categorical
outcomes (proportions)
Outcome
Variable
Are the observations correlated? Alternative to the chi-
square test if sparse
cells:
independent correlated
Binary or
categorical
(e.g.
fracture,
yes/no)
Chi-square test:
compares proportions between
two or more groups
Relative risks: odds
ratios or risk ratios
Logistic regression:
multivariate technique used
when outcome is binary; gives
multivariate-adjusted odds
ratios
McNemar’s chi-square
test: compares binary outcome
between correlated groups (e.g.,
before and after)
Conditional logistic
regression: multivariate
regression technique for a binary
outcome when groups are
correlated (e.g., matched data)
GEE modeling: multivariate
regression technique for a binary
outcome when groups are
Fisher’s exact test:
compares proportions between
independent groups when there
are sparse data (some cells <5).
McNemar’s exact test:
compares proportions between
correlated groups when there are
sparse data (some cells <5).
Recall: The odds ratio (two
samples=cases and controls)
  Smoker (E) Non-smoker 
(~E)
 
Stroke (D) 15 35
No Stroke (~D) 8 42
 
50
50
25.2
8*35
42*15
===
bc
ad
OR
Interpretation: there is a 2.25-fold higher odds of stroke
in smokers vs. non-smokers.
Inferences about the odds
ratio…
 Does the sampling distribution follow a
normal distribution?
 What is the standard error?
Simulation…
 1. In SAS, assume infinite population of cases
and controls with equal proportion of smokers
(exposure), p=.23 (UNDER THE NULL!)
 2. Use the random binomial function to randomly
select n=50 cases and n=50 controls each with
p=.23 chance of being a smoker.
 3. Calculate the observed odds ratio for the
resulting 2x2 table.
 4. Repeat this 1000 times (or some large number
of times).
 5. Observe the distribution of odds ratios under
the null hypothesis.
Properties of the OR (simulation)
(50 cases/50 controls/23% exposed)
Under the null, this is the expected 
variability of the sample ORnote 
the right skew
Properties of the lnOR
Normal!
Properties of the lnOR
From the simulation,
can get the empirical
standard error (~0.5)
and p-value (~.10)
Properties of the lnOR
dcba
1111
+++
Or, in general, standard error
=
Inferences about the ln(OR)
  Smoker (E) Non-smoker 
(~E)
 
Stroke (D) 15 35
No Stroke (~D) 8 42
 
50
50
81.0)ln(
25.2
=
=
OR
OR
64.1
494.0
81.0
42
1
35
1
15
1
8
1
0)25.2ln(
==
+++
−
=Z p=.10
Confidence interval…
  Smoker (E) Non-smoker 
(~E)
 
Stroke (D) 15 35
No Stroke (~D) 8 42
 
50
50
92.5,85.0,CI%95
78.1,16.0494.0*96.181.0lnCI%95
78.116.
==
−=±=
−
eeOR
OR
Final answer: 2.25 (0.85,5.92)
Practice problem:
Suppose the following data were collected in a case-control study of brain tumor and
cell phone usage:
Brain tumor No brain
tumor
Own a cell
phone
20 60
Don’t own a
cell phone
10 40
Is there sufficient evidence for an association between cell phones and brain tumor?
Answer
1. What is your null hypothesis?
Null hypothesis: OR=1.0; lnOR = 0
Alternative hypothesis: OR≠ 1.0; lnOR>0
2. What is your null distribution?
lnOR~ N(0, ) ; =SD (lnOR) = .44
3. Empirical evidence: = 20*40/60*10 =800/600 = 1.33
∴ lnOR = .288
4. Z = (.288-0)/.44 = .65
p-value = P(Z>.65 or Z<-.65) = .26*2
5. Not enough evidence to reject the null hypothesis of no association
40
1
60
1
20
1
10
1
+++
40
1
60
1
20
1
10
1
+++
TWO-SIDED TEST
TWO-SIDED TEST: it
would be just as
extreme if the sample
lnOR were .65 standard
deviations or more
below the null mean
Key measures of relative risk:
95% CIs OR and RR:








++++








+++−
dcbadcba
1111
96.1
1111
96.1
exp*OR,exp*OR







 +−
+
+−
+







 +−
+
+−
−
c
dcc
a
baa
c
dcc
a
baa )/(1)/(1
96.1
)/(1)/(1
96.1
exp*RR,exp*RR
For an odds ratio, 95% confidence limits:
For a risk ratio, 95% confidence limits:
Continuous outcome (means)
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent groups
ANOVA: compares means
between more than two
independent groups
Pearson’s correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
Paired ttest: compares
means between two related
groups (e.g., the same subjects
before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two or
Non-parametric statistics
Wilcoxon sign-rank test:
non-parametric alternative to the
paired ttest
Wilcoxon sum-rank test
(=Mann-Whitney U test): non-
parametric alternative to the ttest
Kruskal-Wallis test: non-
parametric alternative to ANOVA
Spearman rank
correlation coefficient:
The two-sample t-test
The two-sample T-test
 Is the difference in means that we
observe between two groups more than
we’d expect to see based on chance
alone?
The standard error of the
difference of two means
 
 
**First add the variances and then take the square root
of the sum to get the standard error.
mn
yx
yx
22
σσ
σ +=−
Recall, Var (A-B) =
Var (A) + Var (B) if
A and B are
independent!
Shown by simulation:
91.
30
5
==SE
91.
30
5
==SE
91.
30
5
==SE
91.
30
5
==SE
29.1
30
25
30
25
)( =+=diffSE
One sample of
30 (with SD=5).
One sample of
30 (with SD=5).
Difference of the two samples.
Distribution of differences
),(~
22
mn
NYX
yx
yxmn
σσ
µµ +−−
If X and Y are the averages of n and m subjects, respectively:
But…
 As before, you usually have to use the
sample SD, since you won’t know the
true SD ahead of time…
 So, again becomes a T-distribution...
Estimated standard error of
the difference….
m
s
n
s yx
yx
22
+≈−σ
Just plug in the sample
standard deviations for each
group.
Case 1: un-pooled variance
Question: What are your degrees of freedom here?
Answer: Not obvious!
Case 1: ttest, unpooled
variances
It is complicated to figure out the degrees of freedom here! A good
approximation is given as df ≈ harmonic mean (or SAS will tell you!):
νt
m
s
n
s
YX
T
yx
mn
~
22
+
−
=
mn
11
2
+
Case 2: pooled variance
If you assume that the standard deviation of the
characteristic (e.g., IQ) is the same in both groups, you can
pool all the data to estimate a common standard deviation.
This maximizes your degrees of freedom (and thus your
power).
2
)()(
)()1(and
1
)(
)()1(and
1
)(
:variancespooling
1
2
1
2
2
1
221
2
2
1
221
2
2
−+
−+−
=∴
−=−
−
−
=
−=−
−
−
=
∑∑
∑
∑
∑
∑
==
=
=
=
=
mn
yyxx
s
yysm
m
yy
s
xxsn
n
xx
s
m
i
mi
n
i
ni
p
m
i
miy
m
i
mi
y
n
i
nix
n
i
ni
x
2
)1()1( 22
2
−+
−+−
=
mn
smsn
s
yx
p
Degrees of
Freedom!
Estimated standard error
(using pooled variance estimate)
m
s
n
s pp
yx
22
+≈−σ
2
)()(
:
1
2
1
2
2
−+
−+−
=∴
∑∑ ==
mn
yyxx
s
where
m
i
mi
n
i
ni
p
The degrees
of freedom
are n+m-2
Case 2: ttest, pooled
variances
2
22
~ −+
+
−
= mn
pp
mn
t
m
s
n
s
YX
T
2
)1()1( 22
2
−+
−+−
=
mn
smsn
s
yx
p
Alternate calculation formula:
ttest, pooled variance
2~ −+
+
−
= mn
p
mn
t
mn
nm
s
YX
T
)()()
11
( 22
22
mn
mn
s
mn
m
mn
n
s
nm
s
n
s
m
s
ppp
pp +
=+=+=+
Pooled vs. unpooled variance
Rule of Thumb: Use pooled unless you have a
reason not to.
Pooled gives you more degrees of freedom.
Pooled has extra assumption: variances are
equal between the two groups.
SAS automatically tests this assumption for you
(“Equality of Variances” test). If p<.05, this
suggests unequal variances, and better to
use unpooled ttest.
Example: two-sample t-test
 In 1980, some researchers reported that
“men have more mathematical ability than
women” as evidenced by the 1979 SAT’s,
where a sample of 30 random male
adolescents had a mean score ± 1 standard
deviation of 436±77 and 30 random female
adolescents scored lower: 416±81 (genders
were similar in educational backgrounds,
socio-economic status, and age). Do you
agree with the authors’ conclusions?
Data Summary
n Sampl
e Mean
Sample
Standard
Deviation
Group 1:
women
30 416 81
Group 2:
men
30 436 77
Two-sample t-test
1. Define your hypotheses (null,
alternative)
H0
: ♂-♀ math SAT = 0
Ha: ♂-♀ math SAT ≠ 0 [two-sided]
Two-sample t-test
2. Specify your null distribution:
F and M have similar standard
deviations/variances, so make a “pooled”
estimate of variance.
6245
58
81)29(77)29(
2
)1()1( 2222
2
=
+
=
−+
−+−
=
mn
smsn
s
fm
p
)
30
6245
30
6245
,0(~ 583030 +− TFM 4.20
30
6245
30
6245
=+
Two-sample t-test
3. Observed difference in our experiment = 20
points
Two-sample t-test
4. Calculate the p-value of what you observed
98.
4.20
020
58 =
−
=T
data _null_;
pval=(1-probt(.98, 58))*2;
Example 2: Difference in means
 Example: Rosental, R. and Jacobson,
L. (1966) Teachers’ expectancies:
Determinates of pupils’ I.Q. gains.
Psychological Reports, 19, 115-118.
The Experiment
(note: exact numbers have been altered)
 Grade 3 at Oak School were given an IQ test at
the beginning of the academic year (n=90).
 Classroom teachers were given a list of names of
students in their classes who had supposedly
scored in the top 20 percent; these students were
identified as “academic bloomers” (n=18).
 BUT: the children on the teachers lists had
actually been randomly assigned to the list.
 At the end of the year, the same I.Q. test was re-
administered.
Example 2
 Statistical question: Do students in the
treatment group have more improvement
in IQ than students in the control group?
What will we actually compare?
 One-year change in IQ score in the treatment
group vs. one-year change in IQ score in the
control group.
“Academic
bloomers”
(n=18)
Controls
(n=72)
Change in IQ score: 12.2 (2.0) 8.2 (2.0)
Results:
12.2 points 8.2 points
Difference=4 points
The standard deviation
of change scores was
2.0 in both groups. This
affects statistical
significance…
What does a 4-point
difference mean?
 Before we perform any formal statistical
analysis on these data, we already
have a lot of information.
 Look at the basic numbers first; THEN
consider statistical significance as a
secondary guide.
Is the association statistically
significant?
 This 4-point difference could reflect a
true effect or it could be a fluke.
 The question: is a 4-point difference
bigger or smaller than the expected
sampling variability?
Hypothesis testing
Null hypothesis: There is no difference between
“academic bloomers” and normal students (=
the difference is 0%)
Step 1: Assume the null hypothesis.
Hypothesis Testing
 These predictions can be made by
mathematical theory or by computer
simulation.
Step 2: Predict the sampling variability assuming the null
hypothesis is true
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true—math theory:
0.42
=p
s
)52.0
72
4
18
4
,0(~ 88"" =+− Tcontrolgifted µµ
Hypothesis Testing
 In computer simulation, you simulate
taking repeated samples of the same
size from the same population and
observe the sampling variability.
 I used computer simulation to take 1000
samples of 18 treated and 72 controls
Step 2: Predict the sampling variability assuming the null
hypothesis is true—computer simulation:
Computer Simulation Results
Standard error is
about 0.52
3. Empirical data
Observed difference in our experiment =
12.2-8.2 = 4.0
4. P-value
t-curve with 88 df’s has slightly wider
cut-off’s for 95% area (t=1.99) than a
normal curve (Z=1.96)
p-value <.0001
8
52.
4
52.
2.82.12
88 ==
−
=t
If we ran this
study 1000 times
we wouldn’t
expect to get 1
result as big as a
difference of 4
(under the null
hypothesis).
Visually…
5. Reject null!
 Conclusion: I.Q. scores can bias
expectancies in the teachers’ minds
and cause them to unintentionally treat
“bright” students differently from those
seen as less bright.
Confidence interval (more
information!!)
95% CI for the difference: 4.0±1.99(.52) =
(3.0 – 5.0)
t-curve with 88 df’s
has slightly wider cut-
off’s for 95% area
(t=1.99) than a normal
curve (Z=1.96)
What if our standard deviation
had been higher?
 The standard deviation for change
scores in treatment and control were
each 2.0. What if change scores had
been much more variable—say a
standard deviation of 10.0 (for both)?
Standard error is
0.52 Std. dev in
change scores =
2.0
Std. dev in
change scores =
10.0
Standard error is 2.58
With a std. dev. of 10.0…
LESS STATISICAL POWER!
Standard
error is 2.58
If we ran this
study 1000 times,
we would expect to
get ≥+4.0 or ≤–4.0
12% of the time.
P-value=.12
Don’t forget: The paired T-test
 Did the control group in the previous
experiment improve
at all during the year?
 Do not apply a two-sample ttest to answer
this question!
 After-Before yields a single sample of
differences…
 “within-group” rather than “between-group”
comparison…
Continuous outcome (means);
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent groups
ANOVA: compares means
between more than two
independent groups
Pearson’s correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
Paired ttest: compares
means between two related
groups (e.g., the same subjects
before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two or
Non-parametric statistics
Wilcoxon sign-rank test:
non-parametric alternative to the
paired ttest
Wilcoxon sum-rank test
(=Mann-Whitney U test): non-
parametric alternative to the ttest
Kruskal-Wallis test: non-
parametric alternative to ANOVA
Spearman rank
correlation coefficient:
Data Summary
n Sampl
e Mean
Sample
Standard
Deviation
Group 1:
Change
72 +8.2 2.0
Did the control group in the
previous experiment improve
at all during the year?
28
29.
2.8
72
2
02.8
271 ==
−
=t
p-value <.0001
Normality assumption of ttest
 If the distribution of the trait is normal, fine to use
a t-test.
 But if the underlying distribution is not normal
and the sample size is small (rule of thumb: n>30
per group if not too skewed; n>100 if distribution
is really skewed), the Central Limit Theorem
takes some time to kick in. Cannot use ttest.
 Note: ttest is very robust against the normality
assumption!
Alternative tests when normality
is violated: Non-parametric tests
Continuous outcome (means);
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent groups
ANOVA: compares means
between more than two
independent groups
Pearson’s correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
Paired ttest: compares
means between two related
groups (e.g., the same subjects
before and after)
Repeated-measures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two or
Non-parametric statistics
Wilcoxon sign-rank test:
non-parametric alternative to the
paired ttest
Wilcoxon sum-rank test
(=Mann-Whitney U test): non-
parametric alternative to the ttest
Kruskal-Wallis test: non-
parametric alternative to ANOVA
Spearman rank
correlation coefficient:
Non-parametric tests
 t-tests require your outcome variable
to be normally distributed (or close
enough), for small samples.
 Non-parametric tests are based on
RANKS instead of means and
standard deviations (=“population
parameters”).
Example: non-parametric tests
10 dieters following Atkin’s diet vs. 10 dieters following
Jenny Craig
Hypothetical RESULTS:
Atkin’s group loses an average of 34.5 lbs.
J. Craig group loses an average of 18.5 lbs.
Conclusion: Atkin’s is better?
Example: non-parametric tests
BUT, take a closer look at the individual data…
Atkin’s, change in weight (lbs):
+4, +3, 0, -3, -4, -5, -11, -14, -15, -300
J. Craig, change in weight (lbs)
-8, -10, -12, -16, -18, -20, -21, -24, -26, -30
Jenny Craig
-30 -25 -20 -15 -10 -5 0 5 10 15 20
0
5
10
15
20
25
30
P
e
r
c
e
n
t
Weight Change
Atkin’s
-300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20
0
5
10
15
20
25
30
P
e
r
c
e
n
t
Weight Change
t-test inappropriate…
 Comparing the mean weight loss of the
two groups is not appropriate here.
 The distributions do not appear to be
normally distributed.
 Moreover, there is an extreme outlier
(this outlier influences the mean a great
deal).
Wilcoxon rank-sum test
 RANK the values, 1 being the least weight
loss and 20 being the most weight loss.
 Atkin’s
 +4, +3, 0, -3, -4, -5, -11, -14, -15, -300
  1, 2, 3, 4, 5, 6, 9, 11, 12, 20
 J. Craig
 -8, -10, -12, -16, -18, -20, -21, -24, -26, -30
 7, 8, 10, 13, 14, 15, 16, 17, 18, 19
Wilcoxon rank-sum test
 Sum of Atkin’s ranks:
  1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 + 20=73
 Sum of Jenny Craig’s ranks:
7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137
 Jenny Craig clearly ranked higher!
 P-value *(from computer) = .018
*For details of the statistical test, see appendix of these slides…
Binary or categorical
outcomes (proportions)
Outcome
Variable
Are the observations correlated? Alternative to the chi-
square test if sparse
cells:
independent correlated
Binary or
categorical
(e.g.
fracture,
yes/no)
Chi-square test:
compares proportions between
two or more groups
Relative risks: odds
ratios or risk ratios
Logistic regression:
multivariate technique used
when outcome is binary; gives
multivariate-adjusted odds
ratios
McNemar’s chi-square
test: compares binary outcome
between two correlated groups (e.g.,
before and after)
Conditional logistic
regression: multivariate
regression technique for a binary
outcome when groups are
correlated (e.g., matched data)
GEE modeling: multivariate
regression technique for a binary
outcome when groups are
Fisher’s exact test:
compares proportions between
independent groups when there
are sparse data (some cells <5).
McNemar’s exact test:
compares proportions between
correlated groups when there are
sparse data (some cells <5).
Difference in proportions (special
case of chi-square test)
Standard error of the difference of two proportions=
21
2211
212
22
1
11 )()(n
where,
)1()1(
or
)ˆ1(ˆ)ˆ1(ˆ
nn
pnp
p
n
pp
n
pp
n
pp
n
pp
+
+
=
−
+
−−
+
−
Standard error of a proportion=
n
pp )1( −
Null distribution of a difference
in proportions
Standard error can be estimated by=
(still normally distributed)
n
pp )ˆ1(ˆ −
Analagous to pooled variance
in the ttest
The variance of a difference is the
sum of variances (as with difference
in means).
Null distribution of a difference
in proportions
Difference of proportions )
)1()1(
,(~
21
21
n
pp
n
pp
ppN
−
+
−
−
Difference in proportions test
Null hypothesis: The difference in proportions is 0.
21
21
)1(*)1(*
n
pp
n
pp
pp
Z
−
+
−
−
=
2groupinnumber
1groupinnumber
2groupinproportion
1groupinproportion
)proportionaverage(just
2
1
2
1
21
2211
=
=
=
=
+
+
=
n
n
p
p
nn
pnpn
p
Recall, variance of a
proportion is p(1-p)/n
Use average (or
pooled) proportion in
standard error formula,
because under the null
hypothesis, groups
have equal proportions.
Follows a normal
because binomial can
be approximated with
normal
Recall case-control example:
Smoker (E) Non-smoker
(~E)
Stroke (D) 15 35
No Stroke (~D) 8 42 50
50
Absolute risk: Difference in
proportions exposed
%14%16%30
50/850/15)~/()/(
=−=
−=− DEPDEP
Smoker (E) Non-smoker
(~E)
Stroke (D) 15 35
No Stroke (~D) 8 42 50
50
Difference in proportions
exposed
67.1
084.
14.
50
77.*23.
50
77.*23.
%0%14
==
+
−
=Z
.31to03.0084.*96.114.0:CI%95 −=±
Example 2: Difference in
proportions
 Research Question: Are
antidepressants arisk factor for suicide
attempts in children and adolescents?
Example modified from: “Antidepressant Drug Therapy and Suicide in Severely
Depressed Children and Adults ”; Olfson et al. Arch Gen Psychiatry.2006;63:865-
872.
Example 2: Difference in
Proportions
 Design: Case-control study
 Methods: Researchers used Medicaid
records to compare prescription histories
between 263 children and teenagers (6-18
years) who had attempted suicide and 1241
controls who had never attempted suicide (all
subjects suffered from depression).
 Statistical question: Is a history of use of
antidepressants more common among cases
than controls?
Example 2
 Statistical question: Is a history of use of
antidepressants more common among
heart disease cases than controls?
What will we actually compare?
 Proportion of cases who used
antidepressants in the past vs. proportion of
controls who did
No (%) of
cases
(n=263)
No (%) of
controls
(n=1241)
Any antidepressant
drug ever 120 (46%) 448 (36%)
46% 36%
Difference=10%
Results
Is the association statistically
significant?
 This 10% difference could reflect a true
association or it could be a fluke in this
particular sample.
 The question: is 10% bigger or smaller
than the expected sampling variability?
Hypothesis testing
Null hypothesis: There is no association
between antidepressant use and suicide
attempts in the target population (= the
difference is 0%)
Step 1: Assume the null hypothesis.
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true
)033.=
1241
)
1504
568
1(
1504
568
+
263
)
1504
568
1(
1504
568
=σ,0(N~pˆpˆ controlscases
Also: Computer Simulation Results
Standard error is
about 3.3%
Hypothesis Testing
Step 3: Do an experiment
We observed a difference of 10% between
cases and controls.
Hypothesis Testing
Step 4: Calculate a p-value
003.=p;0.3=
033.
10.
=Z
When we ran this
study 1000 times,
we got 1 result as
big or bigger than
10%.
P-value from our simulation…
We also got 3
results as small
or smaller than
–10%.
P-valueP-value
From our simulation, we
estimate the p-value to be:
4/1000 or .004
Here we reject the null.
Alternative hypothesis: There is an association
between antidepressant use and suicide in the
target population.
Hypothesis Testing
Step 5: Reject or do not reject the null hypothesis.
What would a lack of
statistical significance mean?
 If this study had sampled only 50 cases
and 50 controls, the sampling variability
would have been much higher—as
shown in this computer simulation…
Standard error is
about 10%
50 cases and 50
controls.
Standard error is
about 3.3% 263 cases and
1241 controls.
With only 50 cases and 50 controls…
Standard
error is
about 10%
If we ran this
study 1000 times,
we would expect to
get values of 10%
or higher 170
times (or 17% of
the time).
Two-tailed p-value
Two-tailed
p-value =
17%x2=34%
Practice problem…
An August 2003 research article in
Developmental and Behavioral Pediatrics
reported the following about a sample of UK
kids: when given a choice of a non-branded
chocolate cereal vs. CoCo Pops, 97% (36) of
37 girls and 71% (27) of 38 boys preferred
the CoCo Pops. Is this evidence that girls are
more likely to choose brand-named products?
Answer
1. Hypotheses:
H0
: p♂
-p♀
= 0
Ha: p♂
-p♀
≠ 0 [two-sided]
2. Null distribution of difference of two proportions:
3. Observed difference in our experiment = .97-.71= .26
4. Calculate the p-value of what you observed:
085.
38
)16(.84.
37
)16(.84.
)
38
)
75
63
1(
75
63
37
)
75
63
1(
75
63
,0(~ˆˆ
=+
−
+
−
=− σNpp mf
data _null_;
pval=(1-probnorm(3.06))*2;
put pval;
Null says p’s are equal so
estimate standard error using
overall observed p
06.3
085.
026.
=
−
=Z
Key two-sample Hypothesis
Tests…
Test for Ho
: μx
- μy
= 0 (σ2
unknown, but roughly equal):
Test for Ho
: p1-
p2
= 0:
 
2
)1()1(
;
22
2
22
2
−
−+−
=
+
−
=−
n
snsn
s
n
s
n
s
yx
t
yyxx
p
y
p
x
p
n
21
2211
21
21
ˆˆ
;
)1)(()1)((
ˆˆ
nn
pnpn
p
n
pp
n
pp
pp
Z
+
+
=
−
+
−
−
=
Corresponding confidence
intervals…
For a difference in means, 2 independent
samples (σ2
’s unknown but roughly equal):
For a difference in proportions, 2 independent
samples:
y
p
x
p
n
n
s
n
s
tyx
22
2/,2)( +∗±− − α
21
2/21
)1)(()1)((
)ˆˆ(
n
pp
n
pp
Zpp
−
+
−
∗±− α
Appendix: details of rank-sum
test…
Wilcoxon Rank-sum test
),min(
12
)1(
2Z
2
)1(
U
,10,01for
2
)1(
U
)(npopulationlargerthefromrankstheofsumtheisT
)(npopulationsmallerfromrankstheofsumtheisT
n.to1fromorderinnsobservatiotheofallRank
210
2121
21
0
2
22
212
211
11
211
22
11
UUU
nnnn
nn
U
T
nn
nn
nnT
nn
nn
=
++
−
=−
+
+=
>>−
+
+=
Find P(U² U0) in Mann-Whitney U tables
With n2 = the bigger of the 2 populations
Example

For example, if team 1 and team 2 (two gymnastic
teams) are competing, and the judges rank all the
individuals in the competition, how can you tell if
team 1 has done significantly better than team 2 or
vice versa?
Answer

Intuition: under the null hypothesis of no difference between the
two groups…
 If n1=n2, the sums of T1 and T2 should be equal.
 But if n1≠n2, then T2 (n2=bigger group) should automatically be
bigger. But how much bigger under the null?

For example, if team 1 has 3 people and team 2 has 10, we could
rank all 13 participants from 1 to 13 on individual performance. If
team1 (X) and team2 don’t differ in talent, the ranks ought to be
spread evenly among the two groups, e.g.…

1 2 X 4 5 6 X 8 9 10 X 12 13 (exactly even distribution if team1
ranks 3rd
, 7th
, and 11th
)
(larger)2groupofranksofsum
(smaller)1groupofranksofsum
2
1
=
=
T
T
21
22112
2
221121
2
1
2121
1
21
2
)1(
2
)1(
2
)(
2
)1)((21
nn
nnnnnnnnnnnn
nnnn
iTT
nn
i
+
+
+
+
=
+++++
=
+++
==+ ∑
+
=
Remember
this?
sum of within-group ranks for smaller
group.
2
)1( 11
1
1
+
=∑=
nn
i
n
i
sum of within-group ranks for larger
group.
2
)1( 22
1
2
+
=∑=
nn
i
n
i
3065591
2
)14)(13(
:heree.g.,
13
1
21 ++====+ ∑=i
iTT
21
2211
21
2
)1(
2
)1(
nn
nnnn
TT +
+
+
+
=+
Take-home point:
49655
6
2
)4(3
55
2
)11(10
3
1
10
1
=−
=
==
∑
∑
=
=
i
i
i
T1 = 3 + 7 + 11 =21
T2 = 1 + 2 + 4 + 5 + 6 + 8 + 9 +10 + 12 +13 = 70
70-21 = 49 Magic!
The difference between the sum of the
ranks within each individual group is 49.
The difference between the sum of the
ranks of the two groups is also equal to 49
if ranks are evenly interspersed (null is
true).
It turns out that, if the null hypothesis is true, the difference
between the larger-group sum of ranks and the smaller-group sum
of ranks is exactly equal to the difference between T1 and T2
2
)1(
2
)1(
null,Under the
1122
12
+
−
+
=−
nnnn
TT
.equalshouldsumTheir
2
)1(
Udefine
2
)1(
Udefine
22
)1(
22
)1(
2
)1(
2
)1(
2
)1(
2
)1(
21
121
11
1
221
22
2
2111
1
2122
2
1122
12
21
2211
12
nn
Tnn
nn
Tnn
nn
nnnn
T
nnnn
T
nnnn
TT
nn
nnnn
TT
−+
+
=
−+
+
=
+
+
=
+
+
=
+
−
+
=−
+
+
+
+
=+ From slide 23
From slide 24
Define new
statistics
Here, under null:
U2=55+30-70
U1=6+30-21
U2+U1=30
 ∴ under null hypothesis, U1
should equal U2
:
0)]T()
2
)1(
2
)1(
[()U-E(U 12
1122
12 =−−
+
−
+
= T
nnnn
E
The U’s should be equal to each other and will equal n1
n2
/2:
U1
+ U2
= n1
n2
Under null hypothesis, U1
= U2
= U0
∴E(U1
+ U2
) = 2E(U0
) = n1
n2
E(U1
= U2
=U0
) = n1
n2
/2
So, the test statistic here is not quite the difference in the
sum-of-ranks of the 2 groups
It’s the smaller observed U value: U0
For small n’s, take U0, and get p-value directly from a U
table.
For large enough n’s (>10 per
group)…
)(
2
)(
)(
Z
0
21
0
0
00
UVar
nn
U
UVar
UEU
−
=
−
=
2
)( 21
0
nn
UE =
12
)1(
)( 2121
0
++
=
nnnn
UVar
Add observed data to the
example…
Example: If the girls on the two gymnastics teams were ranked as follows:
Team 1: 1, 5, 7 Observed T1
= 13
Team 2: 2,3,4,6,8,9,10,11,12,13 Observed T2
= 78
Are the teams significantly different?
Total sum of ranks = 13*14/2 = 91 n1
n2
=3*10 = 30
Under the null hypothesis: expect U1
- U2
= 0 and U1
+ U2
= 30 (each should equal about 15 under the
null) and U0
= 15
U1
=30 + 6 – 13 = 23
U2
= 30 + 55 – 78 = 7
∴U0
= 7
Not quite statistically significant in U table…p=.1084 (see attached) x2 for two-tailed test
Example problem 2
A study was done to compare the Atkins Diet (low-carb) vs. Jenny Craig
(low-cal, low-fat). The following weight changes were obtained; note
they are very skewed because someone lost 100 pounds; the mean loss
for Atkins is going to look higher because of the bozo, but does that
mean the diet is better overall? Conduct a Mann-Whitney U test to
compare ranks.
Atkins Jenny Craig
-100 -11
-8 -15
-4 -5
+5 +6
+8 -20
+2
Answer Atkins Jenny Craig
1 4
5 3
7 6
9 10
11 2
8
Sum of ranks for JC = 25 (n=5)
Sum of ranks for Atkins=41 (n=6)
n1
n2
=5*6 = 30
under the null hypothesis: expect U1
- U2
= 0 and
U1
+ U2
= 30 and U0
= 15
U1
=30 + 15 – 25 = 20
U2
= 30 + 21 – 41 = 10
U0
= 10; n1
=5, n2
=6
Go to Mann-Whitney chart….p=.2143x 2 = .42

Contenu connexe

Tendances (20)

Chi square
Chi squareChi square
Chi square
 
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate TestStudent's T-test, Paired T-Test, ANOVA & Proportionate Test
Student's T-test, Paired T-Test, ANOVA & Proportionate Test
 
Sign test
Sign testSign test
Sign test
 
Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
 
Kruskal-Wallis H test
Kruskal-Wallis H testKruskal-Wallis H test
Kruskal-Wallis H test
 
Sign Test
Sign TestSign Test
Sign Test
 
T test and types of t-test
T test and types of t-testT test and types of t-test
T test and types of t-test
 
Chi square test
Chi square testChi square test
Chi square test
 
Chi square test final
Chi square test finalChi square test final
Chi square test final
 
Test of significance
Test of significanceTest of significance
Test of significance
 
Analysis of variance
Analysis of varianceAnalysis of variance
Analysis of variance
 
T-Test
T-TestT-Test
T-Test
 
t test
t testt test
t test
 
Research method ch08 statistical methods 2 anova
Research method ch08 statistical methods 2 anovaResearch method ch08 statistical methods 2 anova
Research method ch08 statistical methods 2 anova
 
Student's T-Test
Student's T-TestStudent's T-Test
Student's T-Test
 
F Distribution
F  DistributionF  Distribution
F Distribution
 
Measures of Variability
Measures of VariabilityMeasures of Variability
Measures of Variability
 
Analysis of variance (ANOVA)
Analysis of variance (ANOVA)Analysis of variance (ANOVA)
Analysis of variance (ANOVA)
 
T test
T testT test
T test
 
Chi square
Chi squareChi square
Chi square
 

En vedette

Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Dr Bryan Mills
 
What is a paired samples t test
What is a paired samples t testWhat is a paired samples t test
What is a paired samples t testKen Plummer
 
The t Test for Two Independent Samples
The t Test for Two Independent SamplesThe t Test for Two Independent Samples
The t Test for Two Independent Samplesjasondroesch
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samplesshoffma5
 
香港六合彩
香港六合彩香港六合彩
香港六合彩iewsxc
 
T test for two independent samples
T test for two independent samplesT test for two independent samples
T test for two independent samplesJaclyn Chua Yap
 
Unit 5 lesson 2
Unit 5 lesson 2Unit 5 lesson 2
Unit 5 lesson 2VMRoberts
 
Spss2 comparing means_two_groups
Spss2 comparing means_two_groupsSpss2 comparing means_two_groups
Spss2 comparing means_two_groupsriddhu12
 
Aron chpt 9 ed t test independent samples
Aron chpt 9 ed t test independent samplesAron chpt 9 ed t test independent samples
Aron chpt 9 ed t test independent samplesKaren Price
 
(마더세이프라운드)임상연구에 필요한 통계 분석
(마더세이프라운드)임상연구에 필요한 통계 분석 (마더세이프라운드)임상연구에 필요한 통계 분석
(마더세이프라운드)임상연구에 필요한 통계 분석 mothersafe
 
통계적방법론발표Ppt Kmlikejy
통계적방법론발표Ppt Kmlikejy통계적방법론발표Ppt Kmlikejy
통계적방법론발표Ppt Kmlikejyhyun
 
(마더세이프라운드) 임상연구에 필요한 기초 통계
(마더세이프라운드) 임상연구에 필요한 기초 통계 (마더세이프라운드) 임상연구에 필요한 기초 통계
(마더세이프라운드) 임상연구에 필요한 기초 통계 mothersafe
 
12.세표본 이상의 평균비교
12.세표본 이상의 평균비교12.세표본 이상의 평균비교
12.세표본 이상의 평균비교Yoonwhan Lee
 
11.두표본의 평균비교
11.두표본의 평균비교11.두표본의 평균비교
11.두표본의 평균비교Yoonwhan Lee
 
Stat 130 chi-square goodnes-of-fit test
Stat 130   chi-square goodnes-of-fit testStat 130   chi-square goodnes-of-fit test
Stat 130 chi-square goodnes-of-fit testAldrin Lozano
 
Factorial design
Factorial designFactorial design
Factorial designGaurav Kr
 
R 기초 : R Basics
R 기초 : R BasicsR 기초 : R Basics
R 기초 : R BasicsYoonwhan Lee
 

En vedette (20)

Two sample t-test
Two sample t-testTwo sample t-test
Two sample t-test
 
Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Introduction to t-tests (statistics)
Introduction to t-tests (statistics)
 
What is a paired samples t test
What is a paired samples t testWhat is a paired samples t test
What is a paired samples t test
 
The t Test for Two Independent Samples
The t Test for Two Independent SamplesThe t Test for Two Independent Samples
The t Test for Two Independent Samples
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samples
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
T test for two independent samples
T test for two independent samplesT test for two independent samples
T test for two independent samples
 
Unit 5 lesson 2
Unit 5 lesson 2Unit 5 lesson 2
Unit 5 lesson 2
 
Statistics
Statistics Statistics
Statistics
 
Spss2 comparing means_two_groups
Spss2 comparing means_two_groupsSpss2 comparing means_two_groups
Spss2 comparing means_two_groups
 
Aron chpt 9 ed t test independent samples
Aron chpt 9 ed t test independent samplesAron chpt 9 ed t test independent samples
Aron chpt 9 ed t test independent samples
 
(마더세이프라운드)임상연구에 필요한 통계 분석
(마더세이프라운드)임상연구에 필요한 통계 분석 (마더세이프라운드)임상연구에 필요한 통계 분석
(마더세이프라운드)임상연구에 필요한 통계 분석
 
통계적방법론발표Ppt Kmlikejy
통계적방법론발표Ppt Kmlikejy통계적방법론발표Ppt Kmlikejy
통계적방법론발표Ppt Kmlikejy
 
(마더세이프라운드) 임상연구에 필요한 기초 통계
(마더세이프라운드) 임상연구에 필요한 기초 통계 (마더세이프라운드) 임상연구에 필요한 기초 통계
(마더세이프라운드) 임상연구에 필요한 기초 통계
 
Factorial anova
Factorial anovaFactorial anova
Factorial anova
 
12.세표본 이상의 평균비교
12.세표본 이상의 평균비교12.세표본 이상의 평균비교
12.세표본 이상의 평균비교
 
11.두표본의 평균비교
11.두표본의 평균비교11.두표본의 평균비교
11.두표본의 평균비교
 
Stat 130 chi-square goodnes-of-fit test
Stat 130   chi-square goodnes-of-fit testStat 130   chi-square goodnes-of-fit test
Stat 130 chi-square goodnes-of-fit test
 
Factorial design
Factorial designFactorial design
Factorial design
 
R 기초 : R Basics
R 기초 : R BasicsR 기초 : R Basics
R 기초 : R Basics
 

Similaire à The two sample t-test

Test of hypothesis (t)
Test of hypothesis (t)Test of hypothesis (t)
Test of hypothesis (t)Marlon Gomez
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docxsimonithomas47935
 
Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxshakirRahman10
 
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-testHypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-testRavindra Nath Shukla
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarAzmi Mohd Tamil
 
Lecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptLecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptMohammedAbdela7
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxxababid981
 
Lec. 10: Making Assumptions of Missing data
Lec. 10: Making Assumptions of Missing dataLec. 10: Making Assumptions of Missing data
Lec. 10: Making Assumptions of Missing dataMohamadKharseh1
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysismetalkid132
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsUniversity of Salerno
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminHazilahMohd
 

Similaire à The two sample t-test (20)

Stat2013
Stat2013Stat2013
Stat2013
 
lecture12.ppt
lecture12.pptlecture12.ppt
lecture12.ppt
 
lecture12.ppt
lecture12.pptlecture12.ppt
lecture12.ppt
 
Test of hypothesis (t)
Test of hypothesis (t)Test of hypothesis (t)
Test of hypothesis (t)
 
Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptx
 
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-testHypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
 
Two Means, Independent Samples
Two Means, Independent SamplesTwo Means, Independent Samples
Two Means, Independent Samples
 
non para.doc
non para.docnon para.doc
non para.doc
 
Talk 3
Talk 3Talk 3
Talk 3
 
Chi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemarChi-square, Yates, Fisher & McNemar
Chi-square, Yates, Fisher & McNemar
 
Lecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.pptLecture-6 (t-test and one way ANOVA.ppt
Lecture-6 (t-test and one way ANOVA.ppt
 
Factorial Experiments
Factorial ExperimentsFactorial Experiments
Factorial Experiments
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx
 
Ch01_03.ppt
Ch01_03.pptCh01_03.ppt
Ch01_03.ppt
 
Lec. 10: Making Assumptions of Missing data
Lec. 10: Making Assumptions of Missing dataLec. 10: Making Assumptions of Missing data
Lec. 10: Making Assumptions of Missing data
 
Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysis
 
Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd Amin
 

Plus de Christina K J

Perceived Barriers of Patients with ESRD regarding Kidney Transplantation
Perceived Barriers of Patients with  ESRD   regarding Kidney TransplantationPerceived Barriers of Patients with  ESRD   regarding Kidney Transplantation
Perceived Barriers of Patients with ESRD regarding Kidney TransplantationChristina K J
 
Progressive patient care
Progressive patient careProgressive patient care
Progressive patient careChristina K J
 
Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...
Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...
Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...Christina K J
 
Presentation on aortic aneurysm.
Presentation on aortic aneurysm.Presentation on aortic aneurysm.
Presentation on aortic aneurysm.Christina K J
 
Sample size and power
Sample size and powerSample size and power
Sample size and powerChristina K J
 
Three diamensional audiovisual aids
Three diamensional audiovisual aidsThree diamensional audiovisual aids
Three diamensional audiovisual aidsChristina K J
 
Fluid and electrolyte imbalnce
Fluid and electrolyte imbalnceFluid and electrolyte imbalnce
Fluid and electrolyte imbalnceChristina K J
 
Acute respiratory distress syndrome
Acute respiratory distress syndromeAcute respiratory distress syndrome
Acute respiratory distress syndromeChristina K J
 
Presentation on microorganisms
Presentation on microorganismsPresentation on microorganisms
Presentation on microorganismsChristina K J
 

Plus de Christina K J (14)

Perceived Barriers of Patients with ESRD regarding Kidney Transplantation
Perceived Barriers of Patients with  ESRD   regarding Kidney TransplantationPerceived Barriers of Patients with  ESRD   regarding Kidney Transplantation
Perceived Barriers of Patients with ESRD regarding Kidney Transplantation
 
Progressive patient care
Progressive patient careProgressive patient care
Progressive patient care
 
Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...
Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...
Vapocoolant spray vs. lidocaine prilocaine cream for reducing the pain of ven...
 
Presentation on aortic aneurysm.
Presentation on aortic aneurysm.Presentation on aortic aneurysm.
Presentation on aortic aneurysm.
 
Sample size and power
Sample size and powerSample size and power
Sample size and power
 
Disaster nursing
Disaster nursingDisaster nursing
Disaster nursing
 
Breast cancer
Breast cancerBreast cancer
Breast cancer
 
Three diamensional audiovisual aids
Three diamensional audiovisual aidsThree diamensional audiovisual aids
Three diamensional audiovisual aids
 
Healthy diet
Healthy dietHealthy diet
Healthy diet
 
Fluid and electrolyte imbalnce
Fluid and electrolyte imbalnceFluid and electrolyte imbalnce
Fluid and electrolyte imbalnce
 
Acute respiratory distress syndrome
Acute respiratory distress syndromeAcute respiratory distress syndrome
Acute respiratory distress syndrome
 
Paired t Test
Paired t TestPaired t Test
Paired t Test
 
Presentation on microorganisms
Presentation on microorganismsPresentation on microorganisms
Presentation on microorganisms
 
Viruses
VirusesViruses
Viruses
 

Dernier

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Dernier (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

The two sample t-test

  • 2. Binary or categorical outcomes (proportions) Outcome Variable Are the observations correlated? Alternative to the chi- square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate-adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).
  • 3. Recall: The odds ratio (two samples=cases and controls)   Smoker (E) Non-smoker  (~E)   Stroke (D) 15 35 No Stroke (~D) 8 42   50 50 25.2 8*35 42*15 === bc ad OR Interpretation: there is a 2.25-fold higher odds of stroke in smokers vs. non-smokers.
  • 4. Inferences about the odds ratio…  Does the sampling distribution follow a normal distribution?  What is the standard error?
  • 5. Simulation…  1. In SAS, assume infinite population of cases and controls with equal proportion of smokers (exposure), p=.23 (UNDER THE NULL!)  2. Use the random binomial function to randomly select n=50 cases and n=50 controls each with p=.23 chance of being a smoker.  3. Calculate the observed odds ratio for the resulting 2x2 table.  4. Repeat this 1000 times (or some large number of times).  5. Observe the distribution of odds ratios under the null hypothesis.
  • 8. Properties of the lnOR From the simulation, can get the empirical standard error (~0.5) and p-value (~.10)
  • 10. Inferences about the ln(OR)   Smoker (E) Non-smoker  (~E)   Stroke (D) 15 35 No Stroke (~D) 8 42   50 50 81.0)ln( 25.2 = = OR OR 64.1 494.0 81.0 42 1 35 1 15 1 8 1 0)25.2ln( == +++ − =Z p=.10
  • 11. Confidence interval…   Smoker (E) Non-smoker  (~E)   Stroke (D) 15 35 No Stroke (~D) 8 42   50 50 92.5,85.0,CI%95 78.1,16.0494.0*96.181.0lnCI%95 78.116. == −=±= − eeOR OR Final answer: 2.25 (0.85,5.92)
  • 12. Practice problem: Suppose the following data were collected in a case-control study of brain tumor and cell phone usage: Brain tumor No brain tumor Own a cell phone 20 60 Don’t own a cell phone 10 40 Is there sufficient evidence for an association between cell phones and brain tumor?
  • 13. Answer 1. What is your null hypothesis? Null hypothesis: OR=1.0; lnOR = 0 Alternative hypothesis: OR≠ 1.0; lnOR>0 2. What is your null distribution? lnOR~ N(0, ) ; =SD (lnOR) = .44 3. Empirical evidence: = 20*40/60*10 =800/600 = 1.33 ∴ lnOR = .288 4. Z = (.288-0)/.44 = .65 p-value = P(Z>.65 or Z<-.65) = .26*2 5. Not enough evidence to reject the null hypothesis of no association 40 1 60 1 20 1 10 1 +++ 40 1 60 1 20 1 10 1 +++ TWO-SIDED TEST TWO-SIDED TEST: it would be just as extreme if the sample lnOR were .65 standard deviations or more below the null mean
  • 14. Key measures of relative risk: 95% CIs OR and RR:         ++++         +++− dcbadcba 1111 96.1 1111 96.1 exp*OR,exp*OR         +− + +− +         +− + +− − c dcc a baa c dcc a baa )/(1)/(1 96.1 )/(1)/(1 96.1 exp*RR,exp*RR For an odds ratio, 95% confidence limits: For a risk ratio, 95% confidence limits:
  • 15. Continuous outcome (means) Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or Non-parametric statistics Wilcoxon sign-rank test: non-parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non- parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient:
  • 17. The two-sample T-test  Is the difference in means that we observe between two groups more than we’d expect to see based on chance alone?
  • 18. The standard error of the difference of two means     **First add the variances and then take the square root of the sum to get the standard error. mn yx yx 22 σσ σ +=− Recall, Var (A-B) = Var (A) + Var (B) if A and B are independent!
  • 19. Shown by simulation: 91. 30 5 ==SE 91. 30 5 ==SE 91. 30 5 ==SE 91. 30 5 ==SE 29.1 30 25 30 25 )( =+=diffSE One sample of 30 (with SD=5). One sample of 30 (with SD=5). Difference of the two samples.
  • 20. Distribution of differences ),(~ 22 mn NYX yx yxmn σσ µµ +−− If X and Y are the averages of n and m subjects, respectively:
  • 21. But…  As before, you usually have to use the sample SD, since you won’t know the true SD ahead of time…  So, again becomes a T-distribution...
  • 22. Estimated standard error of the difference…. m s n s yx yx 22 +≈−σ Just plug in the sample standard deviations for each group.
  • 23. Case 1: un-pooled variance Question: What are your degrees of freedom here? Answer: Not obvious!
  • 24. Case 1: ttest, unpooled variances It is complicated to figure out the degrees of freedom here! A good approximation is given as df ≈ harmonic mean (or SAS will tell you!): νt m s n s YX T yx mn ~ 22 + − = mn 11 2 +
  • 25. Case 2: pooled variance If you assume that the standard deviation of the characteristic (e.g., IQ) is the same in both groups, you can pool all the data to estimate a common standard deviation. This maximizes your degrees of freedom (and thus your power). 2 )()( )()1(and 1 )( )()1(and 1 )( :variancespooling 1 2 1 2 2 1 221 2 2 1 221 2 2 −+ −+− =∴ −=− − − = −=− − − = ∑∑ ∑ ∑ ∑ ∑ == = = = = mn yyxx s yysm m yy s xxsn n xx s m i mi n i ni p m i miy m i mi y n i nix n i ni x 2 )1()1( 22 2 −+ −+− = mn smsn s yx p Degrees of Freedom!
  • 26. Estimated standard error (using pooled variance estimate) m s n s pp yx 22 +≈−σ 2 )()( : 1 2 1 2 2 −+ −+− =∴ ∑∑ == mn yyxx s where m i mi n i ni p The degrees of freedom are n+m-2
  • 27. Case 2: ttest, pooled variances 2 22 ~ −+ + − = mn pp mn t m s n s YX T 2 )1()1( 22 2 −+ −+− = mn smsn s yx p
  • 28. Alternate calculation formula: ttest, pooled variance 2~ −+ + − = mn p mn t mn nm s YX T )()() 11 ( 22 22 mn mn s mn m mn n s nm s n s m s ppp pp + =+=+=+
  • 29. Pooled vs. unpooled variance Rule of Thumb: Use pooled unless you have a reason not to. Pooled gives you more degrees of freedom. Pooled has extra assumption: variances are equal between the two groups. SAS automatically tests this assumption for you (“Equality of Variances” test). If p<.05, this suggests unequal variances, and better to use unpooled ttest.
  • 30. Example: two-sample t-test  In 1980, some researchers reported that “men have more mathematical ability than women” as evidenced by the 1979 SAT’s, where a sample of 30 random male adolescents had a mean score ± 1 standard deviation of 436±77 and 30 random female adolescents scored lower: 416±81 (genders were similar in educational backgrounds, socio-economic status, and age). Do you agree with the authors’ conclusions?
  • 31. Data Summary n Sampl e Mean Sample Standard Deviation Group 1: women 30 416 81 Group 2: men 30 436 77
  • 32. Two-sample t-test 1. Define your hypotheses (null, alternative) H0 : ♂-♀ math SAT = 0 Ha: ♂-♀ math SAT ≠ 0 [two-sided]
  • 33. Two-sample t-test 2. Specify your null distribution: F and M have similar standard deviations/variances, so make a “pooled” estimate of variance. 6245 58 81)29(77)29( 2 )1()1( 2222 2 = + = −+ −+− = mn smsn s fm p ) 30 6245 30 6245 ,0(~ 583030 +− TFM 4.20 30 6245 30 6245 =+
  • 34. Two-sample t-test 3. Observed difference in our experiment = 20 points
  • 35. Two-sample t-test 4. Calculate the p-value of what you observed 98. 4.20 020 58 = − =T data _null_; pval=(1-probt(.98, 58))*2;
  • 36. Example 2: Difference in means  Example: Rosental, R. and Jacobson, L. (1966) Teachers’ expectancies: Determinates of pupils’ I.Q. gains. Psychological Reports, 19, 115-118.
  • 37. The Experiment (note: exact numbers have been altered)  Grade 3 at Oak School were given an IQ test at the beginning of the academic year (n=90).  Classroom teachers were given a list of names of students in their classes who had supposedly scored in the top 20 percent; these students were identified as “academic bloomers” (n=18).  BUT: the children on the teachers lists had actually been randomly assigned to the list.  At the end of the year, the same I.Q. test was re- administered.
  • 38. Example 2  Statistical question: Do students in the treatment group have more improvement in IQ than students in the control group? What will we actually compare?  One-year change in IQ score in the treatment group vs. one-year change in IQ score in the control group.
  • 39. “Academic bloomers” (n=18) Controls (n=72) Change in IQ score: 12.2 (2.0) 8.2 (2.0) Results: 12.2 points 8.2 points Difference=4 points The standard deviation of change scores was 2.0 in both groups. This affects statistical significance…
  • 40. What does a 4-point difference mean?  Before we perform any formal statistical analysis on these data, we already have a lot of information.  Look at the basic numbers first; THEN consider statistical significance as a secondary guide.
  • 41. Is the association statistically significant?  This 4-point difference could reflect a true effect or it could be a fluke.  The question: is a 4-point difference bigger or smaller than the expected sampling variability?
  • 42. Hypothesis testing Null hypothesis: There is no difference between “academic bloomers” and normal students (= the difference is 0%) Step 1: Assume the null hypothesis.
  • 43. Hypothesis Testing  These predictions can be made by mathematical theory or by computer simulation. Step 2: Predict the sampling variability assuming the null hypothesis is true
  • 44. Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true—math theory: 0.42 =p s )52.0 72 4 18 4 ,0(~ 88"" =+− Tcontrolgifted µµ
  • 45. Hypothesis Testing  In computer simulation, you simulate taking repeated samples of the same size from the same population and observe the sampling variability.  I used computer simulation to take 1000 samples of 18 treated and 72 controls Step 2: Predict the sampling variability assuming the null hypothesis is true—computer simulation:
  • 47. 3. Empirical data Observed difference in our experiment = 12.2-8.2 = 4.0
  • 48. 4. P-value t-curve with 88 df’s has slightly wider cut-off’s for 95% area (t=1.99) than a normal curve (Z=1.96) p-value <.0001 8 52. 4 52. 2.82.12 88 == − =t
  • 49. If we ran this study 1000 times we wouldn’t expect to get 1 result as big as a difference of 4 (under the null hypothesis). Visually…
  • 50. 5. Reject null!  Conclusion: I.Q. scores can bias expectancies in the teachers’ minds and cause them to unintentionally treat “bright” students differently from those seen as less bright.
  • 51. Confidence interval (more information!!) 95% CI for the difference: 4.0±1.99(.52) = (3.0 – 5.0) t-curve with 88 df’s has slightly wider cut- off’s for 95% area (t=1.99) than a normal curve (Z=1.96)
  • 52. What if our standard deviation had been higher?  The standard deviation for change scores in treatment and control were each 2.0. What if change scores had been much more variable—say a standard deviation of 10.0 (for both)?
  • 53. Standard error is 0.52 Std. dev in change scores = 2.0 Std. dev in change scores = 10.0 Standard error is 2.58
  • 54. With a std. dev. of 10.0… LESS STATISICAL POWER! Standard error is 2.58 If we ran this study 1000 times, we would expect to get ≥+4.0 or ≤–4.0 12% of the time. P-value=.12
  • 55. Don’t forget: The paired T-test  Did the control group in the previous experiment improve at all during the year?  Do not apply a two-sample ttest to answer this question!  After-Before yields a single sample of differences…  “within-group” rather than “between-group” comparison…
  • 56. Continuous outcome (means); Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or Non-parametric statistics Wilcoxon sign-rank test: non-parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non- parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient:
  • 57. Data Summary n Sampl e Mean Sample Standard Deviation Group 1: Change 72 +8.2 2.0
  • 58. Did the control group in the previous experiment improve at all during the year? 28 29. 2.8 72 2 02.8 271 == − =t p-value <.0001
  • 59. Normality assumption of ttest  If the distribution of the trait is normal, fine to use a t-test.  But if the underlying distribution is not normal and the sample size is small (rule of thumb: n>30 per group if not too skewed; n>100 if distribution is really skewed), the Central Limit Theorem takes some time to kick in. Cannot use ttest.  Note: ttest is very robust against the normality assumption!
  • 60. Alternative tests when normality is violated: Non-parametric tests
  • 61. Continuous outcome (means); Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or Non-parametric statistics Wilcoxon sign-rank test: non-parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non- parametric alternative to the ttest Kruskal-Wallis test: non- parametric alternative to ANOVA Spearman rank correlation coefficient:
  • 62. Non-parametric tests  t-tests require your outcome variable to be normally distributed (or close enough), for small samples.  Non-parametric tests are based on RANKS instead of means and standard deviations (=“population parameters”).
  • 63. Example: non-parametric tests 10 dieters following Atkin’s diet vs. 10 dieters following Jenny Craig Hypothetical RESULTS: Atkin’s group loses an average of 34.5 lbs. J. Craig group loses an average of 18.5 lbs. Conclusion: Atkin’s is better?
  • 64. Example: non-parametric tests BUT, take a closer look at the individual data… Atkin’s, change in weight (lbs): +4, +3, 0, -3, -4, -5, -11, -14, -15, -300 J. Craig, change in weight (lbs) -8, -10, -12, -16, -18, -20, -21, -24, -26, -30
  • 65. Jenny Craig -30 -25 -20 -15 -10 -5 0 5 10 15 20 0 5 10 15 20 25 30 P e r c e n t Weight Change
  • 66. Atkin’s -300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20 0 5 10 15 20 25 30 P e r c e n t Weight Change
  • 67. t-test inappropriate…  Comparing the mean weight loss of the two groups is not appropriate here.  The distributions do not appear to be normally distributed.  Moreover, there is an extreme outlier (this outlier influences the mean a great deal).
  • 68. Wilcoxon rank-sum test  RANK the values, 1 being the least weight loss and 20 being the most weight loss.  Atkin’s  +4, +3, 0, -3, -4, -5, -11, -14, -15, -300   1, 2, 3, 4, 5, 6, 9, 11, 12, 20  J. Craig  -8, -10, -12, -16, -18, -20, -21, -24, -26, -30  7, 8, 10, 13, 14, 15, 16, 17, 18, 19
  • 69. Wilcoxon rank-sum test  Sum of Atkin’s ranks:   1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 + 20=73  Sum of Jenny Craig’s ranks: 7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137  Jenny Craig clearly ranked higher!  P-value *(from computer) = .018 *For details of the statistical test, see appendix of these slides…
  • 70. Binary or categorical outcomes (proportions) Outcome Variable Are the observations correlated? Alternative to the chi- square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between two or more groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate-adjusted odds ratios McNemar’s chi-square test: compares binary outcome between two correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).
  • 71. Difference in proportions (special case of chi-square test)
  • 72. Standard error of the difference of two proportions= 21 2211 212 22 1 11 )()(n where, )1()1( or )ˆ1(ˆ)ˆ1(ˆ nn pnp p n pp n pp n pp n pp + + = − + −− + − Standard error of a proportion= n pp )1( − Null distribution of a difference in proportions Standard error can be estimated by= (still normally distributed) n pp )ˆ1(ˆ − Analagous to pooled variance in the ttest The variance of a difference is the sum of variances (as with difference in means).
  • 73. Null distribution of a difference in proportions Difference of proportions ) )1()1( ,(~ 21 21 n pp n pp ppN − + − −
  • 74. Difference in proportions test Null hypothesis: The difference in proportions is 0. 21 21 )1(*)1(* n pp n pp pp Z − + − − = 2groupinnumber 1groupinnumber 2groupinproportion 1groupinproportion )proportionaverage(just 2 1 2 1 21 2211 = = = = + + = n n p p nn pnpn p Recall, variance of a proportion is p(1-p)/n Use average (or pooled) proportion in standard error formula, because under the null hypothesis, groups have equal proportions. Follows a normal because binomial can be approximated with normal
  • 75. Recall case-control example: Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50
  • 76. Absolute risk: Difference in proportions exposed %14%16%30 50/850/15)~/()/( =−= −=− DEPDEP Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 50
  • 78. Example 2: Difference in proportions  Research Question: Are antidepressants arisk factor for suicide attempts in children and adolescents? Example modified from: “Antidepressant Drug Therapy and Suicide in Severely Depressed Children and Adults ”; Olfson et al. Arch Gen Psychiatry.2006;63:865- 872.
  • 79. Example 2: Difference in Proportions  Design: Case-control study  Methods: Researchers used Medicaid records to compare prescription histories between 263 children and teenagers (6-18 years) who had attempted suicide and 1241 controls who had never attempted suicide (all subjects suffered from depression).  Statistical question: Is a history of use of antidepressants more common among cases than controls?
  • 80. Example 2  Statistical question: Is a history of use of antidepressants more common among heart disease cases than controls? What will we actually compare?  Proportion of cases who used antidepressants in the past vs. proportion of controls who did
  • 81. No (%) of cases (n=263) No (%) of controls (n=1241) Any antidepressant drug ever 120 (46%) 448 (36%) 46% 36% Difference=10% Results
  • 82. Is the association statistically significant?  This 10% difference could reflect a true association or it could be a fluke in this particular sample.  The question: is 10% bigger or smaller than the expected sampling variability?
  • 83. Hypothesis testing Null hypothesis: There is no association between antidepressant use and suicide attempts in the target population (= the difference is 0%) Step 1: Assume the null hypothesis.
  • 84. Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true )033.= 1241 ) 1504 568 1( 1504 568 + 263 ) 1504 568 1( 1504 568 =σ,0(N~pˆpˆ controlscases
  • 85. Also: Computer Simulation Results Standard error is about 3.3%
  • 86. Hypothesis Testing Step 3: Do an experiment We observed a difference of 10% between cases and controls.
  • 87. Hypothesis Testing Step 4: Calculate a p-value 003.=p;0.3= 033. 10. =Z
  • 88. When we ran this study 1000 times, we got 1 result as big or bigger than 10%. P-value from our simulation… We also got 3 results as small or smaller than –10%.
  • 89. P-valueP-value From our simulation, we estimate the p-value to be: 4/1000 or .004
  • 90. Here we reject the null. Alternative hypothesis: There is an association between antidepressant use and suicide in the target population. Hypothesis Testing Step 5: Reject or do not reject the null hypothesis.
  • 91. What would a lack of statistical significance mean?  If this study had sampled only 50 cases and 50 controls, the sampling variability would have been much higher—as shown in this computer simulation…
  • 92. Standard error is about 10% 50 cases and 50 controls. Standard error is about 3.3% 263 cases and 1241 controls.
  • 93. With only 50 cases and 50 controls… Standard error is about 10% If we ran this study 1000 times, we would expect to get values of 10% or higher 170 times (or 17% of the time).
  • 95. Practice problem… An August 2003 research article in Developmental and Behavioral Pediatrics reported the following about a sample of UK kids: when given a choice of a non-branded chocolate cereal vs. CoCo Pops, 97% (36) of 37 girls and 71% (27) of 38 boys preferred the CoCo Pops. Is this evidence that girls are more likely to choose brand-named products?
  • 96. Answer 1. Hypotheses: H0 : p♂ -p♀ = 0 Ha: p♂ -p♀ ≠ 0 [two-sided] 2. Null distribution of difference of two proportions: 3. Observed difference in our experiment = .97-.71= .26 4. Calculate the p-value of what you observed: 085. 38 )16(.84. 37 )16(.84. ) 38 ) 75 63 1( 75 63 37 ) 75 63 1( 75 63 ,0(~ˆˆ =+ − + − =− σNpp mf data _null_; pval=(1-probnorm(3.06))*2; put pval; Null says p’s are equal so estimate standard error using overall observed p 06.3 085. 026. = − =Z
  • 97. Key two-sample Hypothesis Tests… Test for Ho : μx - μy = 0 (σ2 unknown, but roughly equal): Test for Ho : p1- p2 = 0:   2 )1()1( ; 22 2 22 2 − −+− = + − =− n snsn s n s n s yx t yyxx p y p x p n 21 2211 21 21 ˆˆ ; )1)(()1)(( ˆˆ nn pnpn p n pp n pp pp Z + + = − + − − =
  • 98. Corresponding confidence intervals… For a difference in means, 2 independent samples (σ2 ’s unknown but roughly equal): For a difference in proportions, 2 independent samples: y p x p n n s n s tyx 22 2/,2)( +∗±− − α 21 2/21 )1)(()1)(( )ˆˆ( n pp n pp Zpp − + − ∗±− α
  • 99. Appendix: details of rank-sum test…
  • 101. Example  For example, if team 1 and team 2 (two gymnastic teams) are competing, and the judges rank all the individuals in the competition, how can you tell if team 1 has done significantly better than team 2 or vice versa?
  • 102. Answer  Intuition: under the null hypothesis of no difference between the two groups…  If n1=n2, the sums of T1 and T2 should be equal.  But if n1≠n2, then T2 (n2=bigger group) should automatically be bigger. But how much bigger under the null?  For example, if team 1 has 3 people and team 2 has 10, we could rank all 13 participants from 1 to 13 on individual performance. If team1 (X) and team2 don’t differ in talent, the ranks ought to be spread evenly among the two groups, e.g.…  1 2 X 4 5 6 X 8 9 10 X 12 13 (exactly even distribution if team1 ranks 3rd , 7th , and 11th ) (larger)2groupofranksofsum (smaller)1groupofranksofsum 2 1 = = T T
  • 103. 21 22112 2 221121 2 1 2121 1 21 2 )1( 2 )1( 2 )( 2 )1)((21 nn nnnnnnnnnnnn nnnn iTT nn i + + + + = +++++ = +++ ==+ ∑ + = Remember this? sum of within-group ranks for smaller group. 2 )1( 11 1 1 + =∑= nn i n i sum of within-group ranks for larger group. 2 )1( 22 1 2 + =∑= nn i n i 3065591 2 )14)(13( :heree.g., 13 1 21 ++====+ ∑=i iTT 21 2211 21 2 )1( 2 )1( nn nnnn TT + + + + =+ Take-home point:
  • 104. 49655 6 2 )4(3 55 2 )11(10 3 1 10 1 =− = == ∑ ∑ = = i i i T1 = 3 + 7 + 11 =21 T2 = 1 + 2 + 4 + 5 + 6 + 8 + 9 +10 + 12 +13 = 70 70-21 = 49 Magic! The difference between the sum of the ranks within each individual group is 49. The difference between the sum of the ranks of the two groups is also equal to 49 if ranks are evenly interspersed (null is true). It turns out that, if the null hypothesis is true, the difference between the larger-group sum of ranks and the smaller-group sum of ranks is exactly equal to the difference between T1 and T2 2 )1( 2 )1( null,Under the 1122 12 + − + =− nnnn TT
  • 106.  ∴ under null hypothesis, U1 should equal U2 : 0)]T() 2 )1( 2 )1( [()U-E(U 12 1122 12 =−− + − + = T nnnn E The U’s should be equal to each other and will equal n1 n2 /2: U1 + U2 = n1 n2 Under null hypothesis, U1 = U2 = U0 ∴E(U1 + U2 ) = 2E(U0 ) = n1 n2 E(U1 = U2 =U0 ) = n1 n2 /2 So, the test statistic here is not quite the difference in the sum-of-ranks of the 2 groups It’s the smaller observed U value: U0 For small n’s, take U0, and get p-value directly from a U table.
  • 107. For large enough n’s (>10 per group)… )( 2 )( )( Z 0 21 0 0 00 UVar nn U UVar UEU − = − = 2 )( 21 0 nn UE = 12 )1( )( 2121 0 ++ = nnnn UVar
  • 108. Add observed data to the example… Example: If the girls on the two gymnastics teams were ranked as follows: Team 1: 1, 5, 7 Observed T1 = 13 Team 2: 2,3,4,6,8,9,10,11,12,13 Observed T2 = 78 Are the teams significantly different? Total sum of ranks = 13*14/2 = 91 n1 n2 =3*10 = 30 Under the null hypothesis: expect U1 - U2 = 0 and U1 + U2 = 30 (each should equal about 15 under the null) and U0 = 15 U1 =30 + 6 – 13 = 23 U2 = 30 + 55 – 78 = 7 ∴U0 = 7 Not quite statistically significant in U table…p=.1084 (see attached) x2 for two-tailed test
  • 109. Example problem 2 A study was done to compare the Atkins Diet (low-carb) vs. Jenny Craig (low-cal, low-fat). The following weight changes were obtained; note they are very skewed because someone lost 100 pounds; the mean loss for Atkins is going to look higher because of the bozo, but does that mean the diet is better overall? Conduct a Mann-Whitney U test to compare ranks. Atkins Jenny Craig -100 -11 -8 -15 -4 -5 +5 +6 +8 -20 +2
  • 110. Answer Atkins Jenny Craig 1 4 5 3 7 6 9 10 11 2 8 Sum of ranks for JC = 25 (n=5) Sum of ranks for Atkins=41 (n=6) n1 n2 =5*6 = 30 under the null hypothesis: expect U1 - U2 = 0 and U1 + U2 = 30 and U0 = 15 U1 =30 + 15 – 25 = 20 U2 = 30 + 21 – 41 = 10 U0 = 10; n1 =5, n2 =6 Go to Mann-Whitney chart….p=.2143x 2 = .42