Presidents' invited lecture ISCB Vigo 2017
Discusses various issues to do with how randomised clinical trials should be analysed. See also https://errorstatistics.com/2017/07/01/s-senn-fishing-for-fakes-with-fisher-guest-post/
1. The Revenge of RA Fisher
Thoughts on Randomgate and its Wider Implications
Stephen Senn
The revenge of RA Fisher 1
2. Acknowledgements & and an apology
Acknowledgments
• Thanks to KyungMann and ISCB for inviting
me
• I appreciate that I am a late replacement for
Butch (to whom we all wish the best)
• I apologise to the very many of you who will be
disappointed
• Thanks to Bill Rosenberger & Dieter Hilgers
for information on randomisation
• Thanks to Dr Helen Senn for reminding me
about publication bias
• This work is partly supported by the
European Union’s 7th Framework Programme
for research, technological development and
demonstration under grant agreement no.
602552. “IDeAL”
Apology
• I am going to talk about
randomisation
• I was the joint presidents’
invited lecturer at ISCB/SCT 2003
London (Grazia Valsecchi & John
Lachin)
• And I talked about randomisation
then!
The revenge of RA Fisher 2
3. The revenge of RA Fisher 3
A Quote from the Master
“Two things are necessary, however: (a) that a sharp
distinction should be drawn between those components of
error which are to be eliminated in the field, and those which
are not to be eliminated; and that while the elimination of the
one class shall be complete, no attempt shall be made to
eliminate the other; (b) that the statistical process of the
estimation of error shall be modified so as to take account of
the field arrangement, and so that the components of error
actually eliminated in the field shall equally be eliminated in
the statistical laboratory.” (My emphasis.)
R.A. Fisher, CP 48 pp507-508
A slide from 2003
4. Outline
• Carlisle’s bombshell
• Three explanations
• 1. Declaration of independence
• 3. Balancing act
• 2. Minding your Ps and Qs
• Conclusion and recommendations
The revenge of RA Fisher 4
Note that the order is 1,3,2
No I have not randomly
permuted them
1,2,3 is the natural order to
think about them but
1,3,2 is in increasing order of
difficulty
6. Randomgate
“I wanted to determine whether data distributions in trials
published in specialist anaesthetic journals have been different to
distributions in non-specialist medical journals. I analysed the
distribution of 72, 261 means o f 29, 789 variables in 5087
randomised , controlled trials published in eight journals between
January 2000 and December 2015: Anaesthesia (399); Anesthesia
and Analgesia (1288); Anesthesiology (541); British Journal of
Anaesthesia (618); Canadian Journal of Anesthesia (384); European
Journal of Anaesthesiology (404); Journal of the American Medical
Association (518) and New England Journal of Medicine (935). I
chose these journals as I had electronic access to the full text.”
The revenge of RA Fisher 6
7. Statistics of Carlisle’s Statistics
Identifier Description Number Statistic Calculation Description Result
V1 Journals 8
V2 Years 6.0 S1 V1xV2 Journal years 48.0
V3 Papers 5087 S2 V3/(V1xV2) Papers/Journal year 106.0
V4 Variables 29789 S3 V4/V3 Variables/Paper 5.9
V5 Means 72261 S4 V5/V4 Means/Variable 2.4
V6 S5 V4/V3 Means/Paper 14.2
The revenge of RA Fisher 7
8. Little Ps have further Ps upon their backs to
bite them
• Carlisle then proceeds to look at the P of the Ps
• That is to say he considers if their distribution is random
• Calculates the balance three different ways (but all assuming simple
randomisation)
• Chooses that P-value closest to 0.5
• Combines P-values for different variables by averaging the Z scores
• Calculates a one-sided version and subtracts from 1
• Closer to 1 the greater the imbalance
• Closer to 0 the greater the balance
• Then does a QQ plot
• Finds too many values close to 1 and too many close to 0
The revenge of RA Fisher 8
13. Three assumptions in Carlisle’s analysis
• Carlisle assumes his P-values would have a uniform distribution given
that the null hypothesis is true.
• This requires however
1. Independent covariates
2. From randomly chosen randomised trials
3. Analysed as randomised
• I shall take these in the order of increasing difficulty 1,3,2
The revenge of RA Fisher 13
15. Independent covariates?
• Many commentators have picked up on this
• Carlisle combines P-values using what he calls Stouffer’s method
• Weighted Z-statistics
• He calculates the variance of the weighted Z-statistic as if they were
independent
• Ok for combining different trials
• But covariates are not independent
• Example of asthma
• Sex & height
• Height and FEV1
• FEV1 and age
The revenge of RA Fisher 15
16. All outcome
measures are
presumed to be
equi-correlated
with correlation
𝜌 ≥ 0
This implies they
are conditionally
independent
given a latent
variable with
whom they have
correlation
√𝜌
The lines give Z-
inflation as a function
of the number of
measures combined
for 1 to 10 measures
Obviously for 1
measure there is no
inflation
The revenge of RA Fisher 16
19. The revenge of RA Fisher 19
NEJM Baseline P-
values
Expected variance
under H0 is 1/12
This shows that
the variance of
the combined P-
values is very
different from
that of the
individual P-
values
20. Balancing act
Being a statistician means never having to say you are certain
But are you certain how certain you are?
The revenge of RA Fisher 20
21. How are trials really randomised?
The revenge of RA Fisher 21
• Looked at BMJ, JAMA, Lancet, NEJM in 2010
• 258 trials
Method Number Percent
Permuted Blocks 125 49
Minimisation 29 11
Simple randomisation 4 2
Other 4 2
Unclear 96 37
TOTAL 258 100
22. An illustration of what Fisher warned us of
The master’s warning
• If you randomise one way and
analyse another you are in for
trouble
• Your estimate may be reasonable
• Consistent
• Precise
• You won’t know how precise it is
• It is not clear what relationship the
standard error as calculated has to
the variation the statistic would
have over all randomisations
Hills and Arrmitage 1979
• Trial of enuresis
• Patients randomised to one of two
sequences
• Active treatment in period 1 followed
by placebo in period 2
• Placebo in period 1 followed by active
treatment in period 2
• Treatment periods were 14 days
long
• Number of dry nights measured
The revenge of RA Fisher 22
23. The revenge of RA Fisher 23
Hills, M, Armitage, P. The two-
period cross-over clinical trial,
British Journal of Clinical
Pharmacology 1979; 8: 7-20.
24. Important points to note
• Because every patient acts as his or her own control all patient level
covariates (of which there could be thousands and thousands) are
perfectly balanced
• Differences in these covariates can have no effect on the difference
between results under treatment and the results under placebo
• However, period level covariates (changes within the lives of patients)
could have an effect
• My normal practice is to fit a period effect as well as patient effects,
however, I shall omit doing so to simplify
• I am going to analyse these data two ways
• One is reasonable (similar to H & A’s analysis) and the other is silly
The revenge of RA Fisher 24
25. 0.7
4
0.5
2
0.3
0
0.1
-2-4
0.6
0.2
0.4
0.0
Density
Permutatedtreatment effect
Blue diamond shows
treatment effect whether or
not we condition on patient
as a factor.
It is identical because the
trial is balanced by patient.
However the permutation
distribution is quite different
and our inferences are
different whether we
condition (red) or not
(black) and clearly
balancing the randomisation
by patient and not
conditioning the analysis by
patient is wrong
The revenge of RA Fisher 25
26. Not fitting patient effect
Estimate s.e. t(56) t pr.
2.172 0.964 2.25 0.0282
(P-value for permutation is 0.034)
Two Parametric Approaches
Fitting patient effect
Estimate s.e. t(28) t pr
.
2.172 0.616 3.53 0.00147
(P-value for Permutation is 0.0014)
26The revenge of RA Fisher
27. What happens if you balance but don’t
condition?
Approach Variance of estimated
treatment effect over all
randomisations*
Mean of variance of
estimated treatment
effect over all
randomisations*
Completely randomised
Analysed as such
0.987 0.996
Randomised within-
patient
Analysed as such
0.534 0.529
Randomised within-
patient Analysed as
completely randomised
0.534 1.005
*Based on 10000 random permutations
The revenge of RA Fisher 27
That is to say, permute values respecting the fact that they come from a cross-
over but analysing them as if they came from a parallel group trial
28. Fisher’s point of view
• Balancing and not conditioning is an unforgiveable sin
• Your estimated standard errors bear an unknown relationship to the
true standard errors
• It is like analysing a match pairs design as if it were a completely
randomised design
• If you do this, you deserve to fail Stat 1
The revenge of RA Fisher 28
29. Implications for Carlisle’s analysis
• Trials are ‘randomised’ using balancing approaches
• Permuted block
• Within centres
• Minimisation
• This eliminates various effects from the true variance at baseline
• It adds them to the error variance
• It will produce an excess of trials that are balanced ‘too good to be
true’
The revenge of RA Fisher 29
30. Minding your P’s and Q’s
Missing data matters
Except when studying missing data, apparently
The revenge of RA Fisher 30
31. Summary of Observational Studies
(Based on Song et al, 2009)
Favours negative Favours positive
Analysis produced using Guido Schwarzer’s meta package in R
The revenge of RA Fisher 31
37. Implications
• You don’t get to see a random set of randomised trials
• You get to see a selected set
• Although it is true that looking forward for a randomised trial all covariates
should have a null distribution, publication acts as a data-filter
• There are four things we need to know
• What effect does imbalance have on significance under the null
• What effect does imbalance have on significance under the alternative
• What effect does significance have on publication
• What is the distribution of true effects
• The first of these is ‘easy’ but the rest aren’t
The revenge of RA Fisher 37
40. We need to change our publication mode for
trials
• Trials are missing not at random
• Or at least not completely at
random
• Even where the trials are
present, they are poorly
documented
• In particular details of analysis
and randomisation are
frequently missing
• Self-publication is the solution
The revenge of RA Fisher 40
41. The elephant in the room
• We can all breathe a sign of relief
(probably)
• There are three reasons why P-values
for (naïve) baseline tests might not
have the supposed distribution
• This may let (most of) the trials off the
hook as regards dishonesty or
incompetence
• Problem is that, at least as regards
one aspect, all Carlisle was doing was
following a (regrettably) common
practice
• This has unpleasant consequences for
analyses of outcomes
A forgotten moment in statistical history
St Exupéry tries to fit a Normal Distribution to an
elephant using Python
The revenge of RA Fisher 41
42. The really worrying thing is we are analysing
clinical trials all wrong
• Fisher warned us what would happen
• We are balancing but not fitting
• We are failing stat 1
• The result is wasted resources, wasted lives, wasted time and
increased uncertainty
• We need to stop using simplicity as an excuse for not conditioning
• We should use models that reflect (at least) the complexity of the
randomisation procedure
• Finally, if it is fraud you are interested in
The revenge of RA Fisher 42
43. The revenge of RA Fisher 43
Presented at the International Society for Clinical Biostatistics
and the Society for Clinical Trials, Boston, 7-10 July, 1997.
45. The revenge of RA Fisher 45
Finally, I leave you with this thought
Those who don’t take
randomisation seriously
are condemned to
make conclusions at
random
RA Fisher
demonstrating
the theory of
the random
walk and
hitting an
absorbing
barrier
Notes de l'éditeur
Abstract: RA Fisher warned us years ago: as ye randomize so shall ye analyse. We ignored him and we haven't faithfully reflected the way we randomise in the way we analyse. In the pharmaceutical industry, we randomise in permuted blocks but we don't fit the block. Ex pharma researchers minimise but don't condition on the order of patients in the trial.A recent interesting and heroic analysis by John Carlisle has identified dozens of clinical trials in leading journals as being either suspiciously imbalanced at baseline or having balance that's too good to be true. However, he analyses the baseline distributions as if the trials were completely randomized but nearly all trial are not. I consider Carlisle's analysis not only as regards whether or not the whole world of clinical trials is awash with fraud but also as regards the implication for analysis of outcomes, even if it is not.
I conclude that because we have arrogantly assumed that in theory we can do better than Fisher, in practice we often do worse.
Reference:Carlisle, J. B. "Data fabrication and other reasons for non‐random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals." Anaesthesia (2017).
Means per variable gives a rough idea of the average number of arms per paper
NB not only is the posted variance for the hybrid approach (approach 3) not correct it actually increases compared to the completely randomised design!
The R Code to produce this is given below
#To carry out a meta-analysis of publication bias data
#Data taken from Song et al BMC Medical Research Methodology 2009, 9:79
library(meta)#load Guido Schwarzer's package
#Enter data from Figure 2 of Song et al
Study<-c('Lee 2006','Lynch 2007','Okike 2008', 'Olson 2002')
e_pos<-c(35,45,132,78)
n_pos<-c(718,148,620,383)
e_neg<-c(7,19,54,55)
n_neg<-c(109,61,235,362)
meta.fit<-metabin(e_pos,n_pos,e_neg,n_neg,Study,sm="OR",
label.e='Positive',label.c='Negative',Title='Analysi of publication bias')
meta.fit
forest(meta.fit)
#To carry out a meta-analysis of publication bias data
library(meta)#load Guido Schwarzer's package
#Fit data on non-randomised studies
#Figures taken from figure 2 of Song et al's 2009 meta-analysis in BMC
Study1<-c('Olson 2002','Lee 2006','Lynch 2007','Okike 2008')
e_pos1<-c(78,35,45,132)
n_pos1<-c(383,718,148,620)
e_neg1<-c(55,7,19,54)
n_neg1<-c(362,109,61,235)
meta.fit<-metabin(e_pos1,n_pos1,e_neg1,n_neg1,Study1,sm="OR",label.e='Positive',label.c='Negative',title='Analysis of publication bias')
meta.fit
forest(meta.fit)
#Analyse randomised studies
#Data taken from Epstein 2004 (and that paper's account of Epstein 1990)
#Emerson et al ARCH INTERN MED/VOL 170 (NO. 21), NOV 22, 2010
#NB Emerson et al reports two studies
#Enter data
Study2<-c('1990','2004','2010 CORR', '2010 JBJS')
e_pos2<-c(8,10,58,49)
n_pos2<-c(12,16,60,50)
e_neg2<-c(9,5,43,37)
n_neg2<-c(21,17,48,52)
Author<-c('Epstein','Epstein', 'Emerson et al' ,'Emerson et al')
meta.fit<-metabin(e_pos2,n_pos2,e_neg2,n_neg2,Study2,sm="OR",label.e='Positive',label.c='Negative',title='Analysis of publication bias',byvar=Author,bylab='Author')
meta.fit
forest(meta.fit)
Plotted using GenStat program http://62.231.116.94/genstat/
Goldacre (and others) are implicitly assuming that this must be what the truth is since the observed probabilities of acceptance are the same. Since the acceptance curves for negative and positive studies are the same, there is no bias against negative studies.
This is exactly the same biased set up as before. However, at the threshold the probability of acceptance is the same, because authors are submitting if the probability of acceptance is high enough and not using some quality standard. If this is the case you would not necessarily see any difference in probability of acceptance despite editorial bias.
For those of you filling out your bullshit bingo cards I thought I should add elephant in the room