1. Overview of the statistical analysis
Jonas Ranstam, PhD,
National Musculoskeletal Competence Centre, Lund, Sweden
2. Explanations and points of reference
1. Methodological background
2. International guidelines
3. Multiplicity issues
4. Study population definitions
5. Statistical models
4. Clinical research
Before 1948
Unclear validity, unknown statistical precision
- Prof A's patients better than Prof B's
- Small series of patients or even single cases
5. Streptomycin in Tuberculosis Trials Committee.
Streptomycin treatment of pulmonary tuberculosis.
BMJ 1948;2:769-83.
The Control Scheme
Determination of whether a patient would be treated by streptomycin
and bed-rest (S case) or by bed-rest alone (C case) was made by
reference to a statistical series based on random sampling numbers
drawn up for each sex at each centre by Professor Bradford Hill; the
details of the series were unknown to any of the investigators or to the
co-ordinator and were contained in a set of sealed envelopes, each
bearing on the outside only the name of the hospital and a number.
6. Clinical research
From 1948
Elimination/reduction of bias, assessment of
statistical precision
- Randomization and blinding (intervention studies)
- Effect modeling (observation studies)
- P-values and confidence intervals
7. Quantitative principles I
Randomized allocation of patients to treatment groups
(and blinding when possible) guarantee that:
1. All differences between treatment groups at
baseline are random (not systematic).
Complete absence of baseline imbalance is not
the aim. Stratification on prognostic factors are
used to make the groups less imbalanced.
2. Treatment effect estimates are unaffected by
selection and confounding bias (and with
blinding, differential misclassification bias).
8. Quantitative principles II
1. Individual effects vary between subjects.
Different samples of subjects will yield
different observed mean effects.
2. The subject variation can be estimated
using the observations in a random sample.
3. A universal mean effect can be estimated,
and the reliability of this estimate can be
described with p-values and confidence
intervals.
9. P-values are often misunderstood
They do
- describe the reliability of findings. P < 0.05 is usually
considered reliable.
They do not
- describe clinical relevance (they depend on sample
size).
- show that a difference “does not exist” (“n.s.” is
absence of evidence, not evidence of absence).
12. ICMJE – the Vancouver group
Results
“Avoid relying solely on statistical hypothesis testing,
such as the use of P values, which fails to convey
important information about effect size.”
“When possible, quantify findings and present them
with appropriate indicators of measurement error or
uncertainty (such as confidence intervals).”
14. P-values vs. confidence intervals
P-value Confidence intervals
2 possible outcomes 5 possible outcomes
Statistically and clinically significant effect
p < 0.05
p < 0.05 Statistically, but not necessarily clinically, significant effect
n.s. Inconclusive
n.s. Neither statistically nor clinically significant effect
p < 0.05 Statistically significant reversed effect
Bad Good
0
Effect Clinically significant effect
15. Clinical trials
International regulatory guidelines
ICH Topic E9 - Statistical Principles for Clinical Trials
EMEA Points to consider: baseline covariates
- missing data
- multiplicity issues
- etc.
and similar documents from the FDA
These guidelines can all be found on the internet.
17. Multiplicity
Multiplicity of inferences is present in almost all trials.
If not properly handled, unsubstantiated claims for
effectiveness may be made as a consequence of an
inflated rate of false positive conclusions.
18. Multiplicity
The chance of at least one
false positive finding (FPR) = 1 - (1 – α)k
where k is the number of performed comparisons and
α the significance level (usually 0.05).
k = 1 => FPR = 0.05
k = 2 => FPR = 0.0975
k = 10 => FPR = 0.4013
Bonferroni method: divide the significance level by the
number of comparisons. This is bad for the statistical
power, should be avoided.
19. Endpoints
Primary The variable capable of providing the
most clinically relevant evidence
directly related to the primary objective
of the trial
Secondary Either measurements supporting the
primary endpoint or effects related to
secondary objectives
20. Statistical analyses
Confirmatory The result concerns a primary endpoint
and the p-value or confidence interval
accounts for potential multiplicity.
The result can support a claim of
superiority, equivalence or non-
inferiority.
Exploratory All other analyses.
The result is either supporting or
explanatory, or simply just a new
hypothesis.
22. Study populations
Intention-to-treat Analyze all randomized subjects
(ITT) principle according to planned treatment
regimen.
Full analysis set The set of subjects that is as close
(FAS) as possible to the ideal implied by
the ITT-principle.
Per protocol The set of subjects who complied
(PP) set with the protocol sufficiently to ensure
that they are likely to exhibit the
effects of treatment according to the
underlying scientific model.
23. FAS vs. PP-set
FAS + no selection bias
- misclassification problem (effect dilution)
PP-set + no contamination problem
- possible selection bias (confounding)
When the FAS and PP-set lead to essentially the same
conclusions, confidence in the trial is supported.
25. Fixed and random effects
Fixed effects when the levels of an effect
constitute the entire population
about which you are interested.
Random effects when the levels in your experiment
represent only a sample from that
population.
Random effects models can be used to analyze data with
multiple observations per patient.
26. Mixed effects model
If all the effects in a statistical model (ANOVA) are
considered random effects, then the model is called a
random effects model; likewise, a model with only
fixed effects is called a fixed effects model. When
some factors are fixed and others are random, the
model is called a mixed model.
(R.A. Fisher 1926: Type-1 and type-2 ANOVA)
27. Data from 3 subjects:
Messrs. Green, Blue and Red
Effect
Baseline 1st visit 2nd visit Time
29. 1. Assume independence between subjects'
repeated observations and use ANOVA
Effect
Baseline 1st visit 2nd visit Time
30. 1. Assume independence between subjects'
repeated observations and use ANOVA
Effect
Bad idea:
Within-subject variation
is confused with
between-subject
variation. Statistical
precision will be
incorrectly calculated.
Baseline 1st visit 2nd visit Time
31. 2. Repeated fixed effects comparisons
e.g. Student's t-tests
Effect
Baseline 1st visit 2nd visit Time
32. 2. Repeated fixed effects comparisons
e.g. Student's t-tests (no FAS)
Effect
Baseline 1st visit 2nd visit Time
34. 3. Fixed effects RM-model
(no FAS)
Effect
Baseline 1st visit 2nd visit Time
35. 4. Fixed effects RM-model with LOCF
Effect
Baseline 1st visit 2nd visit Time
36. 4. Fixed effects RM-model with LOCF
Effect
LOCF-imputation is
not necessarily
conservative, and
under-estimates
variability.
Not the best alternative!
Baseline 1st visit 2nd visit Time
38. 5. Mixed effects (subject random) ANOVA
Effect Within- and between
subject variation are
separated in the model.
Statistical precision is
correctly calculated.
A number of publica-
tions reporting monte-
carlo simulation studies
show that this is the
best alternative, both in
terms of precision and
validity!
Baseline 1st visit 2nd visit Time
39. Example: FREE SF36-PCS
Estimated treatment effect difference at 1 month
Method Difference p-value
ITT-analysis
ME ANOVA 5.5 <0.0001
PP-analysis
FE ANOVA Compl. 5.2 <0.0001
FE ANOVA LOCF 4.9 <0.0001