This document discusses power analysis and sample size determination. It explains key concepts like power, effect size, significance level, and how changing these factors impacts the required sample size. Sample size is important to correctly power a study to detect clinically meaningful effects without excessive subjects. The document provides formulas and examples for calculating sample sizes for various study designs including randomized trials, pre-post, and equivalence studies. Researchers must consider these factors before collecting data to ensure their study is appropriately powered.
2. Researchers differ
A researcher conducted a study comparing the
effect of an intervention vs placebo on reducing
body weight, and found 5 kg reduction among
the intervention group with P=0.01.
Another researcher conducted a similar study
comparing the effect of the same intervention vs
the same placebo on reducing body weight, and
found the same 5 kg reduction with the
intervention group but could not claim that the
intervention was effective because P=0.35.
4. Power is Effected by…..
Variation in the outcome (σ2)
↓ σ2 → power ↑
Significance level (α)
↑α → power ↑
Difference (effect) to be detected (δ)
↑δ → power ↑
One-tailed vs. two-tailed tests
Power is greater in one-tailed tests than in
comparable two-tailed tests
7. Power Formula
Depends on study design
Not hard, but can be VERY algebra
intensive
May want to use a computer program or
statistician
8. How Big a Sample We Need?
Fundamental research question
Should be addressed after determining the primary
objective and study design
Too Few Patients in a clinical study
– May fail to detect a clinically important difference
Too Many
– Involve extra patients
– Therapy may have risks
– Cost more
9. How Big a Sample We Need?
Fundamentalresearch question
How Big?
18
180
1,800
18,000
180,000
10. Sample Size Formula Information
Variables of interest
type of data e.g. continuous, categorical
Desired power
Desired significance level
Effect/difference of clinical importance
Standard deviations of continuous outcome
variables
One or two-sided tests
11. Sample Size & Study Design
Randomized controlled trial (RCT)
Block/stratified-block randomized trial
Equivalence trial
Non-randomized intervention study
Observational study
Prevalence study
Measuring sensitivity and specificity
12. Sample Size & Data Structure
Paired data
Repeated measures
Groups of equal sizes
Hierarchical data
13. Sample Size
Non-randomized studies looking for differences or
associations
require larger sample to allow adjustment for confounding factors
Absolute sample size is of interest
surveys sometimes take % of population approach
Study’s primary outcome is the variable you do the sample
size calculation for
If secondary outcome variables considered important make sure
sample size is sufficient
Increase the ‘real’ sample size to reflect loss to follow up,
expected response rate, lack of compliance, etc.
Make the link between the calculation and increase
14. Steps
Step 1. Define Primary Objective
To see if feeding milk to 5 year old kids enhances
growth.
Step 2. Study Design
Extra Milk Diet
5 yr olds
Normal Milk Diet
Outcome: height (cm)
Step 3. Define clinically significant difference
one wishes to detect
Difference (∆) of 0.5 cm
15. Steps
Step 4. Define degree of certainty of finding this
difference
beta (β) or type II error : The probability of NOT detecting a
significant difference when there really is one.
Risk of a false-negative finding ie Risk of declaring no significant
difference in height between the milk diets when a difference
really does exist.
Set at ≤ 20%
Power of the Test: Probability of detecting a predefined clinically
significant difference.
Power = (1- β) = 1 -20% = 80%
16. Steps
Step 5. Define significance level
Alpha (α) or type I error: The probability of detecting a significant difference
when the treatments are really equally effective
Risk of a false-positive finding
Set at 5% :
One has a 5% chance or 1 in 20 odds of declaring a significant difference
between the milk diets when in fact they are really equal.
We are willing to accept that 1 time out of 20 we will produce a false
positive finding
17. For the Milk Study
Type I error (α) = 0.05
Type II error (β) = 0.20
Power = (1- β) = 0.80
Clinically significant diff (∆) = 0.5cm
Measure of variation (SD) = 2.0 cm
– Exists in literature or “Guesstimate”
Formula Beta
N = 2(SD)2 x f(α, β)
f(α Alpha 0.05 0.10 0.20 0.50
∆2 0.10 10.8 8.6 6.2 2.7
= 2(2)2 x 7.9 / 0.52
0.05 13.0 10.5 7.9 3.8
= 252.8 (each group) 0.02 15.8 13.0 10.0 5.4
0.01 17.8 14.9 11.7 6.6
18. Simple Method
Nomogram
Standardized difference
= smallest medically relevant diff
estimated standard deviation
= 0.5/2.0 = 0.25
Assumptions:
1. 2 sample comparison only
2. Same number of subjects
per group
3. Variable is a continuous
measure that is normally
distributed
500
19. 1 sample test
Study Objective : Study effect of new sleep aid
Baseline to sleep time after taking the medication for one week
Two-sided test, α = 0.05, power = 1-β = 90%
Difference(δ) = 1 (4 hours of sleep to 5)
Standard deviation(σ) = 2 hr
2 2 2 2
( Z1 /2 Z1 ) (1.960 1.282) 2
n 2
42.04 43
12
Change δ from 1hr to 2 hr makes n goes from 43 to 11
2 2
(1.960 1.282) 2
n 2
10.51 11
2
20. 1 sample test
Change power from 90% to 80% makes n goes from 11 to 8
(Small sample: start thinking about using the t distribution)
(1.960 0.841) 2 22
n 7.85 8
22
Change the standard deviation from 2 to 3 makes n goes from 8 to 18
2 2
(1.960 0.841) 3
n 2
17.65 18
2
21. Sleep Aid Example: 2 Sample
Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr)
Two sample randomized parallel design
Needed 43 in the one-sample design
In 2-sample need twice that, in each group!
4 times as many people are needed in this design
2( Z1 /2 Z1 ) 2 2
2(1.960 1.282)2 22
n 2
84.1 85 170 total!
12
Change δ from 1hr to 2 hr makes n goes from 72 to 44
2(1.960 1.282) 2 22
n 21.02 22 44 total
22
22. Sleep Aid Example: 2 Sample
Change power from 90% to 80% makes n goes from 44 to 32
2(1.960 0.841)2 22
n 15.69 16 32 total
22
Change the standard deviation from 2 to 3 makes n goes from 32 to 72
2(1.960 0.841)2 32
n 35.31 36 72 total
22
23. Summary
Changes in the detectable difference have
HUGE impacts on sample size
20 point difference → 25 patients/group
10 point difference → 100 patients/group
5 point difference → 400 patients/group
Changes in α, β, σ, number of samples, if it is a 1-
or 2-sided test can all have a large impact on your
sample size calculation
24. Matched Pair Designs
Similar to 1-sample formula
Means (paired t-test)
Mean difference from paired data
Variance of differences
Proportions
Based on discordant pairs
25. Difference in Proportion
Study Objective
To increase survival by 5% with a new cancer drug
P1 = % survival (std) = 85%
P2 = % survival (new) = 90%
Power = 90%
N = P1 (100 - P1) + P2 (100 - P2) x f (α, β) = 913.5 (each group)
(P2 - P1)2
= 1827 Total
A very large study has the power to demonstrate statistical
significance for very small, even clinically inconsequential
differences.
26. Changes in basic formulae
Unequal #s in Each Group
Ratio of cases to controls
Use if want λ patients randomized to the treatment arm for every patient randomized
to the placebo arm
Take no more than 4-5 controls/case
n2 n1 controls for every case
2 2 2
( Z1 /2 Z1 ) ( 1 2 / )
n1 2
27. # of Covariates & # of Subjects
At least 10 subjects for every variable investigated
In logistic regression
No general justification
This is stability, not power
Peduzzi et al., (1985) biased regression coefficients and
variance estimates
Principle component analysis (PCA) (Thorndike
1978 p 184): N≥10m+50 or even N ≥ m2 + 50
28. Balanced Designs: Easier
Equal numbers in two groups is the easiest
to handle
If you have more than two groups, still,
equal sample sizes easiest
Complicated design = simulations
Done by the statistician
29. Multiple Comparisons
If you have 4 groups
All 2 way comparisons of means
6 different tests
Bonferroni: divide α by # of tests
0.025/6 ≈ 0.0042
High-throughput laboratory tests
30. Flaws in Statements
"A previous study in this area recruited 150 subjects and found highly
significant results (p=0.014), and therefore a similar sample size should
be sufficient here."
Previous studies may have been 'lucky' to find significant results, due to random
sampling variation.
"Sample sizes are not provided because there is no prior information on
which to base them."
Find previously published information
Conduct small pre-study
If a very preliminary pilot study, sample size calculations not usually necessary
No prior information on standard deviations
Give the size of difference that may be detected in terms of number of standard
deviations
31. Roadmap
1. Do a sample size calculation before you start
collecting data
2. Collect data
3. Perform statistical test : IF p value < 0.05, declare
statistical significance
4. Consider clinical significance by looking at the size of
the difference
32. References
“Sample Size Estimation”, Phil Hahn Queen’s
University
”Sample Size and Power”, Laura Lee Johnson,
Ph.D., Statistician, National Center for
Complementary and Alternative Medicine
”Sample Size Estimation and Power Analysis”,
Ayumi Shintani, PhD, MPH Department of
Biostatistics, Vanderbilt University