Are Most Positive Findings False? Confirmatory Bias in the Evaluation of Psychological Interventions

Are Most Positive Findings
False? Confirmatory Bias in the
Evaluation of Psychological
Interventions
James C. Coyne, Ph.D.
jcoyne@mail.med.upenn.edu

Confirmatory BiasConfirmatory Bias
• Consistent Bias in the Availability andConsistent Bias in the Availability and
Interpretation of Data so thatInterpretation of Data so that
Intervention Appears More EffectiveIntervention Appears More Effective
than it isthan it is
• Publication BiasPublication Bias
• Investigator AllegianceInvestigator Allegiance
• Investigators’ Bias in Design of Trials,Investigators’ Bias in Design of Trials,
Selection, Analysis, Interpretation, andSelection, Analysis, Interpretation, and
Subsequent Discussion of DataSubsequent Discussion of Data

RCTs are Not Necessary to Resolve
All Questions.

In the Late 1980s the Quality of theIn the Late 1980s the Quality of the
Evidence Available in Medical Journals wasEvidence Available in Medical Journals was
Subject to Considerable Criticism.Subject to Considerable Criticism.
Strong findings in small trials did not replicate inStrong findings in small trials did not replicate in
subsequent large studies.subsequent large studies.
Results of meta-analyses did not predict outcomeResults of meta-analyses did not predict outcome
of large trials.of large trials.
Details of trials required to form an independentDetails of trials required to form an independent
opinion of a trial were not being provided inopinion of a trial were not being provided in
journal articles.journal articles.
Trials funded by industry consistently supportedTrials funded by industry consistently supported
the superiority of the sponsor’s products.the superiority of the sponsor’s products.

Schulz KF Chalmers I, Hayes RJ, Altman DG. Empirical-Schulz KF Chalmers I, Hayes RJ, Altman DG. Empirical-
Evidence Of Bias - Dimensions Of Methodological QualityEvidence Of Bias - Dimensions Of Methodological Quality
Associated With Estimates Of Treatment Effects InAssociated With Estimates Of Treatment Effects In
Controlled Trials JAMA 273 (5): 408-412 1995Controlled Trials JAMA 273 (5): 408-412 1995
Compared with trials in which authors reportedCompared with trials in which authors reported
adequately concealedadequately concealed treatment allocationtreatment allocation, trials in which, trials in which
concealment was either inadequate or unclear yieldedconcealment was either inadequate or unclear yielded
larger estimates of treatment effects (P<.001). Odds ratioslarger estimates of treatment effects (P<.001). Odds ratios
were exaggerated by 41% for inadequately concealedwere exaggerated by 41% for inadequately concealed
trials and by 30% for unclearly concealed trials. Trials intrials and by 30% for unclearly concealed trials. Trials in
whichwhich participants had been excluded after randomizationparticipants had been excluded after randomization
did not yield larger estimates of effects, but that lack ofdid not yield larger estimates of effects, but that lack of
association may be due to incomplete reporting. Trialsassociation may be due to incomplete reporting. Trials
that werethat were not double-blindnot double-blind also yielded larger estimates ofalso yielded larger estimates of
effects (P=.01), with odds ratios being exaggerated byeffects (P=.01), with odds ratios being exaggerated by
17%.17%.

Chan, A.W. et al (2004). Empirical evidence for selectiveChan, A.W. et al (2004). Empirical evidence for selective
reporting of outcomes in randomized trials: Comparisonreporting of outcomes in randomized trials: Comparison
of protocols to published articles.of protocols to published articles. JAMA, 291JAMA, 291, 2457-2465., 2457-2465.
One hundred two trials with 122 published journal articles andOne hundred two trials with 122 published journal articles and
3736 outcomes were identified.3736 outcomes were identified.
Overall, 50% of outcomes per trial were incompletelyOverall, 50% of outcomes per trial were incompletely
reported.reported.
Statistically significant outcomes had a higher odds of beingStatistically significant outcomes had a higher odds of being
fully reported compared to nonsignificant outcomes (pooledfully reported compared to nonsignificant outcomes (pooled
odds ratio, 2.4; 95% confidence interval [CI], 1.4-4.0).odds ratio, 2.4; 95% confidence interval [CI], 1.4-4.0).
Eighty-six percent of survey responders (42/49) denied theEighty-six percent of survey responders (42/49) denied the
existence of unreported outcomes despite clear evidence to theexistence of unreported outcomes despite clear evidence to the
contrary.contrary.

Strategies of Data Analysis That EnsuredStrategies of Data Analysis That Ensured
Positive Findings Came under SustainedPositive Findings Came under Sustained
CriticismCriticism
Assmann, S. F., Pocock, S. J., Enos, L. E., & Kasten, L. E.Assmann, S. F., Pocock, S. J., Enos, L. E., & Kasten, L. E.
(2000). Subgroup analysis and other (mis)uses of(2000). Subgroup analysis and other (mis)uses of
baseline data in clinical trials.baseline data in clinical trials. Lancet,Lancet, 355355, 1064–1069., 1064–1069.
Freemantle N (2001). Interpreting the results of secondaryFreemantle N (2001). Interpreting the results of secondary
end points and subgroup analyses in clinical trials:end points and subgroup analyses in clinical trials:
should we lock the crazy aunt in the attic?should we lock the crazy aunt in the attic? BMJBMJ,, 322,322,
989.989.
Yusuf, S., Wittes, J., Probstfield, J., & Tyroler, H. A.Yusuf, S., Wittes, J., Probstfield, J., & Tyroler, H. A.
(1991). Analysis and interpretation of treatment effects(1991). Analysis and interpretation of treatment effects
in subgroups of patients in randomized clinical-trials.in subgroups of patients in randomized clinical-trials.
JAMA, 266,JAMA, 266, 93-98.93-98.

Reforms Were Instituted and EnforcedReforms Were Instituted and Enforced
(Even if Inconsistently).(Even if Inconsistently).
Requirement of Declaration of Conflict ofRequirement of Declaration of Conflict of
Interest.Interest.
Adherence to CONSORT required for submittingAdherence to CONSORT required for submitting
a paper for publication.a paper for publication.
Researchers now required to provide a detailedResearchers now required to provide a detailed
description of their study protocols, includingdescription of their study protocols, including
specification of the 1-2 primary endpoints andspecification of the 1-2 primary endpoints and
any subgroup analyses to journals where theyany subgroup analyses to journals where they
intended to publish findings before theyintended to publish findings before they
actually conducted their studies.actually conducted their studies.

CONSORTCONSORT
Consolidated Standards ofConsolidated Standards of
Reporting TrialsReporting Trials
www.consort-statement.orgwww.consort-statement.org
A list of requirements for uniformA list of requirements for uniform
reporting of clinical trials with thereporting of clinical trials with the
overall aim of improving the reporting ofoverall aim of improving the reporting of
Randomized Controlled Trials, toRandomized Controlled Trials, to
facilitate their critical appraisal, and tofacilitate their critical appraisal, and to
facilitate their inclusion in systematicfacilitate their inclusion in systematic
reviews.reviews.

CONSORT ChecklistCONSORT Checklist
22-item checklist and an accompanying22-item checklist and an accompanying
flow diagram of participant’s progressionflow diagram of participant’s progression
through a trial from approach for consentthrough a trial from approach for consent
to completion of follow up assessments.to completion of follow up assessments.
Initial intent was to provide minimalInitial intent was to provide minimal
standards for reporting, not conductingstandards for reporting, not conducting
trials.trials.
Anticipated that in addition, CONSORTAnticipated that in addition, CONSORT
would guide investigators in designingwould guide investigators in designing
and implementing scientifically soundand implementing scientifically sound
trials.trials.

The “Great Debate”The “Great Debate”
(2005)(2005)
"Resolved: Psychosocial
Interventions for Cancer
Patients are Ineffective
and Unacceptable to
Patients."

The Literature was WorseThe Literature was Worse
Than it First LookedThan it First Looked

Positive 1995
Meyer & Mark ('95)
Fawzy et al ('95)
Devine & Westlake ('95)
Mixed 1996-2002
Helgeson & Cohen ('96)
Sheard & Maguire ('99)
Rehse & Pukrop ('03)
Shifting Views of Efficacy
Inconclusive 2002-2004
Newell et al ('02)
Edwards et al ('04)
Gysels et al ('04)

What is Required for a DemonstrationWhat is Required for a Demonstration
of Efficacy?of Efficacy?
Stopped Accruing Patients at a Pre-Set SampleStopped Accruing Patients at a Pre-Set Sample
Size and Without Making a Decision Based onSize and Without Making a Decision Based on
Peeking at Data.Peeking at Data.
Specified a Single Endpoint Ahead of Time thatSpecified a Single Endpoint Ahead of Time that
Would Determine Outcome of Trial.Would Determine Outcome of Trial.
Analyses Based on All Patients Who WereAnalyses Based on All Patients Who Were
Randomized.Randomized.
Obtained a Treatment x Time Interaction Effect.Obtained a Treatment x Time Interaction Effect.

The Literature was WorseThe Literature was Worse
Than It First LookedThan It First Looked
• Cannot accept positive appraisals of aCannot accept positive appraisals of a
particular study or the literature at face value.particular study or the literature at face value.
• Endemic confirmatory bias.Endemic confirmatory bias.
• Myth that combinations of similarly flawedMyth that combinations of similarly flawed
studies can yield an informative contribution tostudies can yield an informative contribution to
the literature: blend them together, you getthe literature: blend them together, you get
taintedtainted scrapple, not pate.scrapple, not pate.

Vickers, A.J., Analysis of variance is easily misapplied inVickers, A.J., Analysis of variance is easily misapplied in
the analysis of randomized trials: a critique andthe analysis of randomized trials: a critique and
discussion of alternative statistical approaches.discussion of alternative statistical approaches.
Psychosom Med,Psychosom Med, 6767(4): p. 652-5, 2005.(4): p. 652-5, 2005.
We are not concerned in whether scores willWe are not concerned in whether scores will
change from baseline (it seems likely than theychange from baseline (it seems likely than they
would) or whether overall anxiety scores,would) or whether overall anxiety scores,
including pretreatment score, differ betweenincluding pretreatment score, differ between
groups (at baseline, they should be similargroups (at baseline, they should be similar
because of randomization). What we arebecause of randomization). What we are
interested in, and why we conducted theinterested in, and why we conducted the
randomized trial, is whether the change over timerandomized trial, is whether the change over time
is different between groups. This is technicallyis different between groups. This is technically
known as the “group by treatment interaction.”known as the “group by treatment interaction.”

What was Going on?What was Going on?
• Most studies recruited samples of cancerMost studies recruited samples of cancer
patients without regard to level ofpatients without regard to level of
distress.distress.
• Low mean levels of distress resulted in anLow mean levels of distress resulted in an
inability to demonstrate interventionsinability to demonstrate interventions
significantly reduced distress.significantly reduced distress.
• Strategies for ending trials andStrategies for ending trials and
organizing, analyzing, reporting andorganizing, analyzing, reporting and
interpreting data hid and perpetuatedinterpreting data hid and perpetuated
the myth that interventions were beingthe myth that interventions were being
shown to be effective.shown to be effective.

Rescuing a Null Trial With A NewlyRescuing a Null Trial With A Newly
Invented Outcome: Benefit FindingInvented Outcome: Benefit Finding**
• A priori primary endpoint was distress.A priori primary endpoint was distress.
• Primary analysis yielded null results, butPrimary analysis yielded null results, but
subsequent reports have reported a secondarysubsequent reports have reported a secondary
analysis in which there was an effect for aanalysis in which there was an effect for a
subgroup of patients.subgroup of patients.
• Subsequent reports give main emphasis to benefitSubsequent reports give main emphasis to benefit
finding as an endpoint.finding as an endpoint.
• Intervention not designed to affect benefit finding,Intervention not designed to affect benefit finding,
no theoretical reason for assuming an effect.no theoretical reason for assuming an effect.
• Benefit finding has unknown clinical significance.Benefit finding has unknown clinical significance.
*Antoni et al, Health Psychology 2001

What are the EndemicWhat are the Endemic
Problems in the Design,Problems in the Design,
Conduct, and ReportingConduct, and Reporting
of Trials?of Trials?

Are We Done Yet? Check the Data AgainAre We Done Yet? Check the Data Again
and See if We Have a Finding to Reportand See if We Have a Finding to Report
• A priori power analysis the occasionalA priori power analysis the occasional
exception rather than the rule.exception rather than the rule.
• Operative Rule: Peek and stop when resultsOperative Rule: Peek and stop when results
are looking good.are looking good.
• Must beware of modest sized trials claimingMust beware of modest sized trials claiming
strong effects--likely to be false positives.strong effects--likely to be false positives.
• Must beware of studies with odd numbersMust beware of studies with odd numbers
of patients accumulated without a powerof patients accumulated without a power
analysis.analysis.

Should We Get ExcitedShould We Get Excited
About UnexpectedAbout Unexpected
Strong Findings With aStrong Findings With a
Small Sample?Small Sample?

Perils of Unexpected Results in
Small Trials
• Threat of spurious findings not a matter
of low power.
• Vulnerability to uncontrolled group
differences, even when there has been no
obvious breakdown in randomization
procedures.
• Finding with a low prior probability
likely to represent a false positive.

Perils of Unexpected Results in
Small Trials
• “…in a RCT, the balance of pretreatment
characteristics is merely one test of the
adequacy of randomization and not proof that
influential imbalances do not exist. Also,
because such tabulations are invariably
marginal summaries only (ie, the totals for each
factor are considered separately), they provide
essentially no insight into the joint distribution
of prognostic factors in the two treatment
groups. It is simple to envision situations in
which the marginal imbalances of prognostic
factors are minimal, but the joint distributions
are different and influential” (Piantadosi, 1990).

““When moderate benefits or negligiblyWhen moderate benefits or negligibly
small benefits are both more plausiblesmall benefits are both more plausible
than extreme benefits, then a p= .001than extreme benefits, then a p= .001
effect in a large trial or overview wouldeffect in a large trial or overview would
provide much stronger evidence thanprovide much stronger evidence than
the same significance level in a smallthe same significance level in a small
trial, a small overview, or a smalltrial, a small overview, or a small
subgroup analysis.”subgroup analysis.”
Collins, et al, Lancet, (1995Collins, et al, Lancet, (1995))

What is wrong withWhat is wrong with
exploring multipleexploring multiple
outcomes?outcomes?

Austin PC, Mamdani MM, et al. Testing multipleAustin PC, Mamdani MM, et al. Testing multiple
statistical hypotheses resulted in spurious associations: Astatistical hypotheses resulted in spurious associations: A
study of astrological signs and health. J Clin Epidem 59study of astrological signs and health. J Clin Epidem 59
(9): 964-969, 2006.(9): 964-969, 2006.
We sought statistically significant associations betweenWe sought statistically significant associations between
astrological signs and health that would be neitherastrological signs and health that would be neither
reproducible nor biologically plausiblereproducible nor biologically plausible..
We searched 223 of the most common diagnoses forWe searched 223 of the most common diagnoses for
hospitalization until we identified two for which subjectshospitalization until we identified two for which subjects
born under one astrological sign had a significantly higherborn under one astrological sign had a significantly higher
probability of hospitalization.probability of hospitalization.
Residents born under Leo had a higher probability ofResidents born under Leo had a higher probability of
gastrointestinal hemorrhage (gastrointestinal hemorrhage (PP = 0.0447), while= 0.0447), while
Sagittarians had a higher probability of humerus fractureSagittarians had a higher probability of humerus fracture
((PP = 0.0123) compared to all other signs combined.= 0.0123) compared to all other signs combined.

What is wrong withWhat is wrong with
unplanned subgroupunplanned subgroup
analyses?analyses?

Schulz KF, Grimes DA. Epidemiology 4 - Multiplicity inSchulz KF, Grimes DA. Epidemiology 4 - Multiplicity in
randomised trials I: endpoints and treatments.randomised trials I: endpoints and treatments. Lancet 365Lancet 365
(9470): 1591-1595 2005.(9470): 1591-1595 2005.
Thousands of potential comparisons can emanate fromThousands of potential comparisons can emanate from
one trial. Investigators might only report the significantone trial. Investigators might only report the significant
comparisons, an unscientific practice if unwitting, andcomparisons, an unscientific practice if unwitting, and
fraudulent if intentional. Researchers must report all thefraudulent if intentional. Researchers must report all the
endpoints analysed and treatments compared.endpoints analysed and treatments compared.
Some researchers torture their data until they speak. TheySome researchers torture their data until they speak. They
examine additional endpoints, manipulate groupexamine additional endpoints, manipulate group
comparisons, do many subgroup analyses, and undertakecomparisons, do many subgroup analyses, and undertake
repeated interim analyses. Difficulties usually manifest atrepeated interim analyses. Difficulties usually manifest at
the analysis phase because investigators add unplannedthe analysis phase because investigators add unplanned
analyses.analyses.

Just What Is Wrong With Post
Hoc Subgroup Analyses?
• High profile papers in the behavioral medicine
literature routinely emphasize positive subgroup
analyses in the face of negative primary analyses
(Classen et al, 2001; Schneiderman et al., 2004).
• In the broader clinical trials literature, this
practice is uniformly seen as inappropriate (Yusuf
et al., 1991).
• Unplanned subgroup analyses frequently yield
spurious results (Assman et al., 2000; Senn &
Harrel, 1979)-- “only in exceptional circumstances
should they affect the conclusions drawn from the
trial” (Brooks et al., 2004, p 229).

Just What Is Wrong With Post HocJust What Is Wrong With Post Hoc
Subgroup Analyses?Subgroup Analyses?
• High profile papers in the behavioral medicineHigh profile papers in the behavioral medicine
literature routinely emphasize subgroup analyses whenliterature routinely emphasize subgroup analyses when
they are positive in the face of negative primarythey are positive in the face of negative primary
analyses (Classen et al, 2001; Schneiderman et al.,analyses (Classen et al, 2001; Schneiderman et al.,
2004).2004).
• In the broader clinical trials literature, this practice isIn the broader clinical trials literature, this practice is
uniformly criticized as inappropriate (Yusuf et al.,uniformly criticized as inappropriate (Yusuf et al.,
1991).1991).
• Unplanned subgroup analyses frequently yield spuriousUnplanned subgroup analyses frequently yield spurious
results (Assman et al., 2000; Senn & Harrel, 1979), andresults (Assman et al., 2000; Senn & Harrel, 1979), and
“only in exceptional circumstances should they affect“only in exceptional circumstances should they affect
the conclusions drawn from the trial” (Brooks et al.,the conclusions drawn from the trial” (Brooks et al.,
2004, p 229).2004, p 229).

Telling It Like It Ain’t: All theTelling It Like It Ain’t: All the
Results That FitResults That Fit
• Primary endpoint typically needs to be inferred, notPrimary endpoint typically needs to be inferred, not
stated.stated.
• Ignore negative results for presumed endpoints:Ignore negative results for presumed endpoints:
Emphasize any positive effect, ignore larger number ofEmphasize any positive effect, ignore larger number of
null findings.null findings.
• Favor secondary and subgroup analyses and endpointsFavor secondary and subgroup analyses and endpoints
developed post hoc over negative findings for presumeddeveloped post hoc over negative findings for presumed
analyses.analyses.
• Discuss negative findings as if positive in subsequentDiscuss negative findings as if positive in subsequent
publications.publications.
• Accommodate existing literature “as is” rather thanAccommodate existing literature “as is” rather than
qualifying interpretation with reference toqualifying interpretation with reference to
methodological shortcomings.methodological shortcomings.

The Norm: Lack ofThe Norm: Lack of
Intent to Treat AnalysesIntent to Treat Analyses
• Data from patients who do not complete trial orData from patients who do not complete trial or
all measurements are discarded.all measurements are discarded.
• ““As treated” analyses ignore informativeAs treated” analyses ignore informative
missing data.missing data.
• Intervention and control patients have differentIntervention and control patients have different
reasons for not providing data and thisreasons for not providing data and this
introduces bias in the available data.introduces bias in the available data.
• ““As treated” data do not generalize back toAs treated” data do not generalize back to
patients entering a trial.patients entering a trial.

Intent to Treat Analysis
• Highly appropriate, one of the basic criteria by which adequacy of
the reporting of randomized clinical trials are evaluated, including
with CONSORT.
• Intent to treat analyses most accurately address the question of
how effective the intervention would be if it were offered outside
the clinical trial.
• Intent to treat analyses preserve the baseline equivalence of
groups that was presumably achieved by randomization; and
these analyses help to ensure that bias is not introduced by
selective retention of patients.
• Particularly important when retention of patients is affected by
loss of patients related to the outcome under study.

Cook, J. M., Palmer, S., Hoffman, K.,Cook, J. M., Palmer, S., Hoffman, K.,
& Coyne, J. C. Evaluation of clinical& Coyne, J. C. Evaluation of clinical
trials appearing in Journal oftrials appearing in Journal of
Consulting and Clinical Psychology:Consulting and Clinical Psychology:
CONSORT and beyond.CONSORT and beyond. The ScientificThe Scientific
Review of Mental Health PracticeReview of Mental Health Practice (in(in
press).press).

Reporting of RCTs in JCCP 1992 and
2002: Before CONSORT and Beyond
Deficiencies were noted in features empiricallyDeficiencies were noted in features empirically
related to confirmatory bias: randomization,related to confirmatory bias: randomization,
blinding, and reporting of intent to treat analyses,blinding, and reporting of intent to treat analyses,
with most articles meeting none of thesewith most articles meeting none of these
requirements.requirements.
NoNo articles specified primary and secondaryarticles specified primary and secondary
endpoints.endpoints.

Reporting of RCTs in JCCP 1992 andReporting of RCTs in JCCP 1992 and
2002: Before CONSORT and Beyond2002: Before CONSORT and Beyond
Significant improvement in reporting fromSignificant improvement in reporting from
1992 to 2002, but substantial gap remained1992 to 2002, but substantial gap remained
between RCTs published in 2002 and fullbetween RCTs published in 2002 and full
compliance with CONSORT.compliance with CONSORT.
Compliance with CONSORT will requireCompliance with CONSORT will require
education and enforcement of standards andeducation and enforcement of standards and
will yield a literature that is discontinuous withwill yield a literature that is discontinuous with
the existing literature in terms of quality ofthe existing literature in terms of quality of
reporting.reporting.

Can We Bury the IdeaCan We Bury the Idea
that Psychotherapythat Psychotherapy
Prolongs the Survival ofProlongs the Survival of
Cancer Patients?Cancer Patients?

Positive Appraisals of
Literature
• Spiegel and Giese-Davis (2004): “5 of 10
randomized trials demonstrate an effect
of psychosocial intervention on survival
time”
• Sephton and Spiegel (2003): “If nothing
else, these studies challenge us to
systematically examine the interaction of
mind and body, to determine the aspects
of therapeutic intervention that are most
effective and the populations that are
most likely to benefit.”

Three of the “positive trials” can beThree of the “positive trials” can be
eliminated because in each case, patientseliminated because in each case, patients
in the intervention got substantiallyin the intervention got substantially
better medical surveillance and care.better medical surveillance and care.
Two of the investigator groups for theseTwo of the investigator groups for these
trials deny that they were even studyingtrials deny that they were even studying
psychotherapy!psychotherapy!

No Clinical Trial that was ExplicitlyNo Clinical Trial that was Explicitly
Designed to Test WhetherDesigned to Test Whether
Psychotherapy Improves Survival ofPsychotherapy Improves Survival of
Cancer Patients, Three at the TimeCancer Patients, Three at the Time
of Spiegel’s Claims, now Five, Hasof Spiegel’s Claims, now Five, Has
Shown a Positive Effect.Shown a Positive Effect.

No study that was designed to test whetherNo study that was designed to test whether
psychotherapy improved survival and inpsychotherapy improved survival and in
which the intervention group did not getwhich the intervention group did not get
better medical care has demonstrated anbetter medical care has demonstrated an
effect.effect.
Claim that psychotherapy promotes survivalClaim that psychotherapy promotes survival
depend on the Spiegel and Fawzy studies,depend on the Spiegel and Fawzy studies,
which have serious limitations.which have serious limitations.

Spiegel D, Bloom JR, Kraemer
HC, Gottheil E (1989): Effect of
treatment on the survival of
patients with metastasic breast
cancer. Lancet 2:888-891.
Cited Over 900 Times

Fawzy, F.I., Canada, A.L., & Fawzy, N.W.
(2003). Malignant melanoma: effects of a brief,
structured psychiatric intervention on survival
and recurrence at 10-year follow-up. Arch Gen
Psychiat 60, 100-103.*
Fawzy FI, Fawzy NW, Hyun CS, Elashoff R,
Guthrie, D, Fahey JL, Morton DL (1993):
Malignant melanoma. Effects of an early
structured psychiatric intervention, coping, and
affective state on recurrence and survival 6
years later. Arch Gen Psychiat, 50: 681-689.
*Cited 448 Times

Taking on the CochraneTaking on the Cochrane
CollaborationCollaboration

Coyne, JC. Cochrane reviewsCoyne, JC. Cochrane reviews vv industryindustry
supported meta-analyses: We should read allsupported meta-analyses: We should read all
reviews with caution.reviews with caution. BMJ, 333BMJ, 333: 916, 2006: 916, 2006
Cochrane meta-analysis concluded couples therapy wasCochrane meta-analysis concluded couples therapy was
not better than individual therapy for depression. Offeringnot better than individual therapy for depression. Offering
of couples therapy should be a matter of “patientof couples therapy should be a matter of “patient
preference and availability of specific resources.” Yet, thepreference and availability of specific resources.” Yet, the
studies reviewed were all seriously flawed. None had closestudies reviewed were all seriously flawed. None had close
to minimal cell size necessary for inclusion in a meta-to minimal cell size necessary for inclusion in a meta-
analysis, much less for a nonequivalence trial. Thisanalysis, much less for a nonequivalence trial. This
premature conclusion serves to discourage thepremature conclusion serves to discourage the
commitment of scarce resources to having maritalcommitment of scarce resources to having marital
therapists available or to research providing an adequatetherapists available or to research providing an adequate
comparison between the two forms of therapy.comparison between the two forms of therapy.

Jorgensen, A. W, Gotzsche, P. C, Hilden, J.Jorgensen, A. W, Gotzsche, P. C, Hilden, J.
Authors' reply on Cochrane reviews vAuthors' reply on Cochrane reviews v
industry supported meta-analyses.industry supported meta-analyses. BMJBMJ
333: 1072-1073, 2006.333: 1072-1073, 2006.
We agree with Tostad and Coyne that someWe agree with Tostad and Coyne that some
Cochrane reviews are not of goodCochrane reviews are not of good
quality...We urge readers who findquality...We urge readers who find
problems with Cochrane reviews to submitproblems with Cochrane reviews to submit
a comment to be published as part of thea comment to be published as part of the
review. This is very easy to do. Usereview. This is very easy to do. Use
"Add/View Feedback" in the index to the"Add/View Feedback" in the index to the
left of each review.left of each review.

How to Ensure a Publishable Positive
Clinical Trial
Have lots of outcome variables, and particularlyHave lots of outcome variables, and particularly
alternative measures of the same outcomes.alternative measures of the same outcomes.
Make sure patients and RAs rating outcomesMake sure patients and RAs rating outcomes
know the treatment in which you are invested.know the treatment in which you are invested.
Adjust randomization as needed and let RAsAdjust randomization as needed and let RAs
know what next treatment assignment will be.know what next treatment assignment will be.

How to Ensure a Publishable PositiveHow to Ensure a Publishable Positive
Clinical TrialClinical Trial
If you do not have significant effects, keepIf you do not have significant effects, keep
accruing patients.accruing patients.
Examine personal characteristics of patients andExamine personal characteristics of patients and
throw away results for patients who were unlikelythrow away results for patients who were unlikely
to benefit from treatment.to benefit from treatment.
Examine all outcomes and report those that areExamine all outcomes and report those that are
significant.significant.
Don’t report treatment x time interactions if theyDon’t report treatment x time interactions if they
are not significant.are not significant.

How to Ensure a Publishable PositiveHow to Ensure a Publishable Positive
Clinical TrialClinical Trial
Don’t report treatment x time interactions if theyDon’t report treatment x time interactions if they
are not significant.are not significant.
Examine results for all possible subgroups andExamine results for all possible subgroups and
report only subgroup analyses for which there arereport only subgroup analyses for which there are
significant effects.significant effects.
Do not report that there were outcome measuresDo not report that there were outcome measures
or subgroups for which the results were examinedor subgroups for which the results were examined
but not found to be significant.but not found to be significant.
In discussion and abstract, emphasize theIn discussion and abstract, emphasize the
outcomes and subgroup analyses that were mostoutcomes and subgroup analyses that were most
positive.positive.

Rumors of the EfficacyRumors of the Efficacy
of Psychologicalof Psychological
Interventions areInterventions are
Premature and GreatlyPremature and Greatly
Exaggerated.Exaggerated.

We must not allow a shared commitment toWe must not allow a shared commitment to
improving the wellbeing of patients to beimproving the wellbeing of patients to be
exploited with exaggerated claims andexploited with exaggerated claims and
poorly conceived, poorly conducted, andpoorly conceived, poorly conducted, and
poorly reported clinical trials.poorly reported clinical trials.

Gunpowder forever sealed the knightsGunpowder forever sealed the knights
fate. No knight could match against afate. No knight could match against a
fired bullet. Soon knights would befired bullet. Soon knights would be
useless because of the projectile thatuseless because of the projectile that
could easily knock a knight off his horse,could easily knock a knight off his horse,
rendering him helpless.rendering him helpless.

Critical Questions to AskCritical Questions to Ask
• Was the Sample Size Set by Power Analysis?Was the Sample Size Set by Power Analysis?
• Is a Primary Outcome Identified?Is a Primary Outcome Identified?
• Are Analyses Intent to Treat?Are Analyses Intent to Treat?
• Are There Subgroup Analyses orAre There Subgroup Analyses or
Cherrypicking of Multiple Outcomes?Cherrypicking of Multiple Outcomes?
• Is There a Treatment x Time Interaction?Is There a Treatment x Time Interaction?
• Do the Abstract and Discussion Section FairlyDo the Abstract and Discussion Section Fairly
Reflect the Results that were Obtained?Reflect the Results that were Obtained?

Are Most Positive Findings False? Confirmatory Bias in the Evaluation of Psychological Interventions

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Are Most Positive Findings False? Confirmatory Bias in the Evaluation of Psychological Interventions

Similaire à Are Most Positive Findings False? Confirmatory Bias in the Evaluation of Psychological Interventions (20)

Plus de James Coyne

Plus de James Coyne (20)

Dernier

Dernier (20)

Are Most Positive Findings False? Confirmatory Bias in the Evaluation of Psychological Interventions

Notes de l'éditeur