Prague 2008

Overview of the statistical analysis
Jonas Ranstam, PhD,
National Musculoskeletal Competence Centre, Lund, Sweden

Explanations and points of reference

1. Methodological background
2. International guidelines
3. Multiplicity issues
4. Study population definitions
5. Statistical models

Clinical research
Before 1948

Unclear validity, unknown statistical precision

- Prof A's patients better than Prof B's
- Small series of patients or even single cases

Streptomycin in Tuberculosis Trials Committee.
Streptomycin treatment of pulmonary tuberculosis.
BMJ 1948;2:769-83.

The Control Scheme

Determination of whether a patient would be treated by streptomycin
and bed-rest (S case) or by bed-rest alone (C case) was made by
reference to a statistical series based on random sampling numbers
drawn up for each sex at each centre by Professor Bradford Hill; the
details of the series were unknown to any of the investigators or to the
co-ordinator and were contained in a set of sealed envelopes, each
bearing on the outside only the name of the hospital and a number.

Clinical research
From 1948

Elimination/reduction of bias, assessment of
statistical precision

- Randomization and blinding (intervention studies)
- Effect modeling (observation studies)
- P-values and confidence intervals

Quantitative principles I

Randomized allocation of patients to treatment groups
(and blinding when possible) guarantee that:

1. All differences between treatment groups at
baseline are random (not systematic).

Complete absence of baseline imbalance is not
the aim. Stratification on prognostic factors are
used to make the groups less imbalanced.

2. Treatment effect estimates are unaffected by
selection and confounding bias (and with
blinding, differential misclassification bias).

Quantitative principles II

1. Individual effects vary between subjects.
Different samples of subjects will yield
different observed mean effects.

2. The subject variation can be estimated
using the observations in a random sample.

3. A universal mean effect can be estimated,
and the reliability of this estimate can be
described with p-values and confidence
intervals.

P-values are often misunderstood
They do

- describe the reliability of findings. P < 0.05 is usually
considered reliable.

They do not

- describe clinical relevance (they depend on sample
size).

- show that a difference “does not exist” (“n.s.” is
absence of evidence, not evidence of absence).

ICMJE – the Vancouver group
Results

“Avoid relying solely on statistical hypothesis testing,
such as the use of P values, which fails to convey
important information about effect size.”

“When possible, quantify findings and present them
with appropriate indicators of measurement error or
uncertainty (such as confidence intervals).”

Example: FREE SF36-PCS

Estimated treatment effect difference at baseline

Difference (95%Ci) p-value
0.4 (-1.7 – 2.6) 0.7

Estimated treatment effect difference at 1 month

Difference (95%Ci) p-value
5.9 (3.7 – 8.2) <0.0001

P-values vs. confidence intervals

P-value Confidence intervals
2 possible outcomes 5 possible outcomes

Statistically and clinically significant effect
p < 0.05

p < 0.05 Statistically, but not necessarily clinically, significant effect

n.s. Inconclusive

n.s. Neither statistically nor clinically significant effect

p < 0.05 Statistically significant reversed effect

Bad Good
0
Effect Clinically significant effect

Clinical trials
International regulatory guidelines
ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates
- missing data
- multiplicity issues
- etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

Multiplicity
Multiplicity of inferences is present in almost all trials.
If not properly handled, unsubstantiated claims for
effectiveness may be made as a consequence of an
inflated rate of false positive conclusions.

Multiplicity
The chance of at least one
false positive finding (FPR) = 1 - (1 – α)k

where k is the number of performed comparisons and
α the significance level (usually 0.05).

k = 1 => FPR = 0.05
k = 2 => FPR = 0.0975
k = 10 => FPR = 0.4013

Bonferroni method: divide the significance level by the
number of comparisons. This is bad for the statistical
power, should be avoided.

Endpoints
Primary The variable capable of providing the
most clinically relevant evidence
directly related to the primary objective
of the trial

Secondary Either measurements supporting the
primary endpoint or effects related to
secondary objectives

Statistical analyses
Confirmatory The result concerns a primary endpoint
and the p-value or confidence interval
accounts for potential multiplicity.

The result can support a claim of
superiority, equivalence or non-
inferiority.

Exploratory All other analyses.

The result is either supporting or
explanatory, or simply just a new
hypothesis.

4. Study population definitions

Study populations
Intention-to-treat Analyze all randomized subjects
(ITT) principle according to planned treatment
regimen.

Full analysis set The set of subjects that is as close
(FAS) as possible to the ideal implied by
the ITT-principle.

Per protocol The set of subjects who complied
(PP) set with the protocol sufficiently to ensure
that they are likely to exhibit the
effects of treatment according to the
underlying scientific model.

FAS vs. PP-set
FAS + no selection bias
- misclassification problem (effect dilution)

PP-set + no contamination problem
- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the same
conclusions, confidence in the trial is supported.

Fixed and random effects
Fixed effects when the levels of an effect
constitute the entire population
about which you are interested.

Random effects when the levels in your experiment
represent only a sample from that
population.

Random effects models can be used to analyze data with
multiple observations per patient.

Mixed effects model
If all the effects in a statistical model (ANOVA) are
considered random effects, then the model is called a
random effects model; likewise, a model with only
fixed effects is called a fixed effects model. When
some factors are fixed and others are random, the
model is called a mixed model.

(R.A. Fisher 1926: Type-1 and type-2 ANOVA)

Data from 3 subjects:
Messrs. Green, Blue and Red
Effect

Baseline 1st visit 2nd visit Time

Analysis requirement: FAS

Effect


1. Assume independence between subjects'
repeated observations and use ANOVA
Effect


1. Assume independence between subjects'
repeated observations and use ANOVA
Effect
Bad idea:
Within-subject variation
is confused with
between-subject
variation. Statistical
precision will be
incorrectly calculated.


2. Repeated fixed effects comparisons
e.g. Student's t-tests
Effect


2. Repeated fixed effects comparisons
e.g. Student's t-tests (no FAS)
Effect


3. Fixed effects RM-model

Effect


3. Fixed effects RM-model
(no FAS)
Effect


4. Fixed effects RM-model with LOCF

Effect


4. Fixed effects RM-model with LOCF

Effect
LOCF-imputation is
not necessarily
conservative, and
under-estimates
variability.

Not the best alternative!


5. Mixed effects (subject random) ANOVA

Effect


5. Mixed effects (subject random) ANOVA

Effect Within- and between
subject variation are
separated in the model.
Statistical precision is
correctly calculated.

A number of publica-
tions reporting monte-
carlo simulation studies
show that this is the
best alternative, both in
terms of precision and
validity!


Example: FREE SF36-PCS

Estimated treatment effect difference at 1 month

Method Difference p-value

ITT-analysis
ME ANOVA 5.5 <0.0001

PP-analysis
FE ANOVA Compl. 5.2 <0.0001
FE ANOVA LOCF 4.9 <0.0001

Prague 2008

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (10)

Similaire à Prague 2008

Similaire à Prague 2008 (20)

Plus de Jonas Ranstam PhD

Plus de Jonas Ranstam PhD (17)

Prague 2008