In Search of Lost Infinities: Understanding the Block Structure in Clinical Trials

In Search of Lost Infinities
What is the “n” in big data?
Stephen Senn, Edinburgh
(c) Stephen Senn 2019 1

Acknowledgements
My thanks for the kind invitation
This work is partly supported by the European Union’s 7th Framework Programme
for research, technological development and demonstration under grant
agreement no. 602552. “IDEAL”.
Work on historical controls is joint with Olivier Collignon, Anna Schritz and Riccardo
Spezia

Outline
• Why block structure matters
• An example analysed with the help of the Rothamsted School and Genstat®
• The TARGET study
• Historical controls
• Lord’s paradox
• Conclusions and lessons

The Rothamsted School
RA Fisher
1890-1962
Variance, ANOVA
Randomisation, design,
significance tests
Frank Yates
1902-1994
Factorials, recovering
Inter-block information
John Nelder
1924-2010
General balance, computing
Genstat®
and Frank Anscombe, David Finney, Rosemary Bailey, Roger Payne etc

Trial in asthma
Basic situation
• Two beta-agonists compared
• Zephyr(Z) and Mistral(M)
• Block structure has several levels
• Different designs will be investigated
• Cluster
• Parallel group
• Cross-over Trial
• Each design will be blocked at a
different level
• NB Each design will collect 336
measurements
Block structure
Level Number
within higher
level
Total
Number
Centre 6 6
Patient 4 24
Episodes 2 48
Measurements 7 336

Block structure
• Patients are nested with centres
• Episodes are nested within patients
• Measurements are nested within
episodes
• Centres/Patients/Episodes/Measurements
Measurements not shown

Possible designs
• Cluster randomised
• In each centre all the patients either receive Zephyr (Z) or Mistral (M) in both
episodes
• Three centres are chosen at random to receive Z and three to receive M
• Parallel group trial
• In each centre half the patients receive Z and half M in both episodes
• Two patients per centre are randomly chosen to receive Z and two receive M
• For each patient the patient receives M in one episode and Z in
another
• The order of allocation, ZM or MZ is random

Null (skeleton) analysis of variance with Genstat ®
Code Output
BLOCKSTRUCTURE Centre/Patient/Episode/Measurement
ANOVA

Full (skeleton) analysis of variance with Genstat ®
Additional Code Output
TREATMENTSTRUCTURE Design[]
ANOVA
(Here Design[] is a pointer with values corresponding
to each of the three designs.)

Variance matters
Points
• Which variances apply depends on
the design
• All three for cluster trial
• First two for parallel trial
• Third only for cross-over trial
• It is possible for the number of
observations to go to infinity
without the variance going to zero
• There is no ‘design-free’ n
Variances
 
 
 
2
2
2
, centres
, patients per centre
, episodes per patient
1)
2)
3)
C
P
E
C C
P C P
E C P E
n
n
n
n between centre contribution
n n between patient contribution
n n n within patient contribution




What about the measurement level?
• I put this in to remind us that not
everything you measure brings
exploitable information to the
same degree
• Randomisation between
measurements was not possible in
any of the schemes
• This makes it difficult to exploit
them except in a summary way
• By averaging
• Warning: some repeated measures
analyses are very strongly reliant
on assumed model structure
 
   
1 6
1 12
1 7
1
6 1
7
2 2
1
1
1
, ,
1
, , usually
7 7 7
M
i
i
Y Y Var
Y
Y Var Y Var Y
 
 


 
 
 
 
  
 
 
 
 

Y Y

The TARGET study
• One of the largest studies ever run in osteoarthritis
• 18,000 patients
• Randomisation took place in two sub-studies of equal
size
• Lumiracoxib versus ibuprofen
• Lumiracoxib versus naproxen
• Purpose to investigate CV and GI tolerability of
lumiracoxib

Baseline Demographics
Sub-Study 1 Sub Study 2
Demographic
Characteristic
Lumiracoxib
n = 4376
Ibuprofen
n = 4397
Lumiracoxib
n = 4741
Naproxen
n = 4730
Use of low-dose
aspirin
975 (22.3) 966 (22.0) 1195 (25.1) 1193 (25.2)
History of
vascular disease
393 (9.0) 340 (7.7) 588 (12.4) 559 (11.8)
Cerebro-
vascular disease
69 (1.6) 65 (1.5) 108 (2.3) 107 (2.3)
Dyslipidaemias 1030 (23.5) 1025 (23.3) 799 (16.9) 809 (17.1)
Nitrate use 105 (2.4) 79 (1.8) 181 (3.8) 165 (3.5)

Baseline Deviances
Model Term
Demographic
Characteristic
Sub-study
(DF=1)
Treatment
given Sub-
study
(DF=2)
Treatment
(DF=2)
Use of low-dose
aspirin
23.57 0.13 13.40
History of
vascular disease
70.14 5.23 47.41
Cerebro-
vascular disease
13.54 0.14 7.75
Dyslipidaemias 117.98 0.17 54.72
Nitrate use 39.83 4.62 29.17

Baseline Chi-square P-values
Model Term
Demographic
Characteristic
Sub-study
(DF=1)
Treatment
given Sub-
study
(DF=2)
Treatment
(DF=2)
Use of low-dose
aspirin
< 0.0001 0.94 0.0012
History of
vascular disease
< 0.0001 0.07 <0.0001
Cerebro-
vascular disease
0.0002 0.93 0.0208
Dyslipidaemias <0.0001 0.92 <0.0001
Nitrate use < 0.0001 0.10 <0.0001

Outcome Variables
Lumiracoxib only
Sub-Study 1 Sub Study 2
Outcome
Variables
Lumiracoxib
n = 4376
Lumiracoxib
n = 4741
Total of
discontinuations
1751
(40.01)
1719
(36.26)
CV events 33
(0.75)
52
(1.10)
At least one AE 699
(15.97)
710
(14.98)
Any GI 1855
(42.39)
1785
(37.65)
Dyspepsia 1230
(28.11)
1037
(21.87)

Deviances and P-Values
Lumiracoxib only fitting Sub-study
Statistic
Outcome
Variables
Deviance P-Value
Total of
discontinuations
37.43 < 0.0001
CV events 0.92 0.33
At least one AE 0.005 0.94
Any GI 0.004 0.95
Dyspepsia 16.85 < 0.0001

Lessons from TARGET
• If you want to use historical controls you will have to work very hard
• You need at least two components of variation in your model
• Between centre
• Between trial
• And possibly a third
• Between eras
• What seems like a lot of information may not be much

Implications for
historical controls
Variation between studies puts a severe
limit on what can be learned using
historical controls.
The fact that you have lots of patients in
your current one-armed study, the fact
that you had masses in all previous
studies and even the fact that there were
many previous studies cannot
compensate for the fact that there is only
one current study.
The most efficient way to deal with
between-study variation is nearly always
to have concurrent controls.
2
2
22
22
2
,
historical studies
patients per historical study
patients in current study
γ between study variance
σ between patient variance
lim
c
h
c
h
c
k n
k
n
n
n
nk




 
          
 
Control ‘group’
variance
Experimental
group variance

Lord’s Paradox
Lord, F.M. (1967) “ A paradox in the interpretation of
group comparisons”, Psychological Bulletin, 68, 304-
305.
“A large university is interested in investigating the effects on the students
of the diet provided in the university dining halls….Various types of data
are gathered. In particular the weight of each student at the time of his
arrival in September and his weight in the following June are recorded”
We shall consider this in the Wainer and Brown version (also considered
by Pearl & McKenzie) in which there are two halls each assigned a
different one of two diets being compared.

Two Statisticians
Statistician One (Say John)
• Calculates difference in weight
(outcome-baseline) for each hall
• No significant difference
between diets as regards this
‘change score’
• Concludes no evidence of
difference between diets
Statistician Two (Say Jane)
• Adjusts for initial weight as a
covariate
• Finds significant diet effect on
adjusted weight
• Concludes there is a difference
between diets

John’s analysis:
comparing
change-scores)

Jane’s analysis:
Comparing covariate
adjusted scores

Pearl & Mackenzie, 2018
D
(Diet)
WF
W1 “However, for statisticians who
are trained in ‘conventional’
(i.e. model-blind) methodology
and avoid using causal lenses,
it is deeply paradoxical “
The Book of Why p217
“In this diagram, W1, is a
confounder
of D and WF and not a
mediator. Therefore, the
second statistician would
be unambiguously right
here.”
The Book of Why p216
NB This diagram adapted from theirs,
which covers change rather than final
weight.

Start with the randomised equivalent
• We suppose that the diets had been randomised to the two halls
• Le us suppose there are 100 students per hall
• Generate some data
• See what Genstat® says about analysis
• Note that ( as we have seen) it is a particular feature of Genstat® that
it does not have to have outcome data to do this
• Given the block and treatment structure alone it will give us a
skeleton ANOVA
• We start by ignoring the covariate

Skeleton ANOVA
BLOCKSTRUCTURE Hall/Student
TREATMENTSTRUCTURE Diet
ANOVA
Analysis of variance
Source of variation d.f.
Hall stratum
Diet 1
Hall.Student stratum 198
Total 199
Code Output
Gentstat® points out the obvious (which, however, has
been universally overlooked). There are no
degrees of freedom to estimate the variability of the
Diet estimate which appears in the Hall and not the
Hall.Student stratum

Adding initial weight as a covariate
BLOCKSTRUCTURE Hall/Student
TREATMENTSTRUCTURE Diet
COVARIATE Base
ANOVA
Analysis of variance (adjusted for covariate)
Covariate: Base
Source of variation d.f.
Hall stratum
Diet 0
Covariate 1
Residual 0
Hall.Student stratum
Covariate 1
Residual 197
Total 199
Code Output
Again Gentstat® points out the obvious (which, however, has
been universally overlooked). There are no degrees of freedom
to estimate the treatment effect because the single degree of
freedom is needed to estimate the between-hall slope.
Conclusion: The Book of Why is far from being unambiguously
right. It is only right if the strong but untestable assumption
can be made that the between-hall regression is the same as
the within-hall regression

A warning for epidemiology
Things that are a problem for controlled clinical trials are very rarely less of a
problem so for observational analysis.
Propensity score, Mendelian randomisation, causal analysis blah, blah, blah are
all very well but if you aren’t thinking about components of variation you should
be.

Conclusions
• Local control is valuable
• Design matters
• Components of variation matter
• The Rothamsted approach brings insight
• Causal analysis needs to be developed further to include components
of variation
• Just because you are rich in data does not mean you are rich in
information
• Be sceptical about “big data”

Finally, I leave you with this thought
A big data-analyst is an expert at producing misleading
conclusions from huge datasets.
It is much more efficient to use a statistician, who can do
the same with small ones.

References
35
1. Nelder JA. The analysis of randomised experiments with orthogonal block structure I. Block
structure and the null analysis of variance. Proceedings of the Royal Society of London Series A.
1965;283:147-62.
2. Nelder JA. The analysis of randomised experiments with orthogonal block structure II.
Treatment structure and the general analysis of variance. Proceedings of the Royal Society of London
Series A. 1965;283:163-78.
3. Lord FM. A paradox in the interpretation of group comparisons. Psychological Bulletin.
1967;66:304-5.
4. Holland PW, Rubin DB. On Lord's Paradox. In: Wainer H, Messick S, editors. Principals of
Modern Psychological Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983.
5. Liang KY, Zeger SL. Longitudinal data analysis of continuous and discrete responses for pre-post
designs. Sankhya-the Indian Journal of Statistics Series B. 2000;62:134-48.
6. Wainer H, Brown LM. Two statistical paradoxes in the interpretation of group differences:
Illustrated with medical school admission and licensing data. American Statistician. 2004;58(2):117-23.
7. Senn SJ. Change from baseline and analysis of covariance revisited. Statistics in Medicine.
2006;25(24):4334–44.
8. Senn SJ, Graf E, Caputo A. Stratification for the propensity score compared with linear regression
techniques to assess the effect of treatment or exposure. Statistics in Medicine. 2007;26(30):5529-44.
9. Van Breukelen GJ. ANCOVA versus change from baseline had more power in randomized studies
and more bias in nonrandomized studies. Journal of clinical epidemiology. 2006;59(9):920-5.
10. Pearl J, Mackenzie D. The Book of Why: Basic Books; 2018.

Blogpost with the first part of the talk
https://errorstatistics.com/2019/03/09/s-senn-to-infinity-and-beyond-how-big-are-your-data-really-guest-post/
To infinity and beyond: how big are your data, really?

In Search of Lost Infinities: Understanding the Block Structure in Clinical Trials

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to In Search of Lost Infinities: Understanding the Block Structure in Clinical Trials

Similar to In Search of Lost Infinities: Understanding the Block Structure in Clinical Trials (20)

More from Stephen Senn

More from Stephen Senn (8)

Recently uploaded

Recently uploaded (20)

In Search of Lost Infinities: Understanding the Block Structure in Clinical Trials