2. Introduction
▪ Types of review
▪ Definition
▪ Function of meta-analysis
Conducting Meta-analysis
▪ Writing the research question and protocol
▪ Comprehensive search
▪ Selection of studies
▪ Appraisal (quality assessment) of studies
▪ Data abstraction
▪ Data analysis
3. Effect size
Presenting the findings – Forest plot
Heterogeneity
Dealing with heterogeneity
▪ Fixed and random effects model
▪ Meta-regression
▪ IPD analysis
Strengths and Weaknesses of meta-analysis
Software for meta-analysis
4.
5. Information explosion
More than 1,00,000 articles are
published each year in more
than 20,000 journals.
Humanly impossible to read
through the articles
published in any field.
Publication bias
Concise summaries of literature
(Reviews) required, after
separating insignificant and
unsound from salient and
crucial.
7. “ review articles written by one or more experts based on a convenience
sample of studies with no description of the underlying methodology”
Confuse ‘absence of proof’ of benefit as ‘proof of absence’ of benefit
Do not statistically combine results from multiple studies
Vote-counting
8. “ a review addressing a specific research question using explicit
methodology of collecting, selecting and appraising studies and,
whenever appropriate, synthesizing their results quantitatively”
Has only qualitative or both qualitative and quantitative components
Quantitative component is meta-analysis
9. “Quantitative approach for systematically combining
results of previous research to arrive at conclusions
about the body of research.”
10. 1952: Hans J. Eysenck concluded that there were no favorable effects
of psychotherapy, starting a raging debate which 25 years of evaluation research
and hundreds of studies failed to resolve
1978: To proved Eysenck wrong, Gene V. Glass statistically
aggregated the findings of 375 psychotherapy outcome studies
Glass (and colleague Smith) concluded that psychotherapy did indeed
work.
Glass called the method “meta-analysis”
11. Underpinning ideas can be identified earlier:
K. Pearson (1904)
Averaged correlations for typhoid mortality after inoculation across 5 samples
R. A. Fisher (1944)
Source of the idea of cumulating probability values
W. G. Cochran (1953)
Discussed a method of averaging means across independent studies
Set out much of the statistical foundation for meta-analysis (e.g., Inverse variance
weighting and homogeneity testing)
12. Identify heterogeneity in effects among multiple studies and, where
appropriate, provide summary measure
Increase statistical power and precision to detect an effect
Develop ,refine, and test hypothesis
Reduce the subjectivity of study comparisons by using systematic and
explicit comparison procedure
Identify data gap in the knowledge base and suggest direction for future
research
Calculate sample size for future studies
Analyses if and how previous studies have modified knowledge on a
certain topic
13.
14. Writing the research question and a protocol
Comprehensive search
Selection of studies
Appraisal (quality assessment) of studies
Data abstraction
Data analysis
15. Research question:
▪ P: the population of interest
▪ I: the intervention or exposure
▪ C: the comparison (in certain situations)
▪ O: the outcome of interest
Protocol: specifying the –
▪ Research question
▪ Search methods
▪ Inclusion and exclusion criteria for studies
▪ Criteria for quality assessment (appraisal) of the studies
▪ Methods of data abstraction and synthesis
16. Cochrane Review of
magnesium
sulphate and other
anticonvulsants for
women with pre-eclampsia
17. Hand searching – ‘gold-standard’ for published studies
The percent (or proportion) of (relevant) studies found in electronic databases
compared to hand searching is termed as ‘sensitivity’; and percent or proportion of
the yield that is relevant is called ‘specificity’.
Computerized databases: Pubmed/Medline, EMBASE, Cochrane
Review/Trials Register
Personal references, and emails
Web, e.g. Google internet search engine (http://scholar.google.com)
Conference programs
Dissertations
Review articles
Government reports, bibliographies
18. Explicit Inclusion and exclusion criteria
Study designs: RCTs or CTs with a non-exercise control group
Subjects: Females > 18 years of age
Publication types: Journal articles, dissertations, & masters theses
Languages: English
Interventions: Bone mineral density assessed at femur, spine, and/or
radius
Time Frame: Studies published & indexed between January 1966 and
December 1998
20. Non-randomized trials:
Treatment allocation related to prognosis or pre-judgment of
appropriateness of treatment
Randomized trials:
Inadequate randomization (e.g., alternating assignment)
Lack of stratification on important factors
Lack of or ineffective blinding
All trials:
Patient drop-outs, patient switching arms
Missing data
Improper statistical analysis
21. Quality scores developed by -
▪ Chalmers et al
▪ Jadad et al
None is absolute best.
Little is known about their relative merits and their association with
study outcomes.
22. Reporting Bias
is a group of related biases potentially leading to over-representation of
significant or positive studies in systematic reviews
Studies with significant positive findings -
More likely to be published- Publication bias - over estimation of
treatment effects
More likely to be published rapidly - Time lag bias
More likely to be published in English - Language bias
More likely to be published more than once - Multiple publication bias
More likely to be cited by others - Citation bias
23. Funnel Plot:
Display the studies included in meta-analysis in a plot of effect size
against sample size (or some other measure of the extent to which the
findings could be affected by the play of chance).
Egger’s Regression Test:
Tests whether small studies tend to have larger effect sizes than would
be expected (implying that small studies with small effect sizes have
not been published).
Begg’s rank correlation test
25. An Asymmetric Funnel Plot
(indicative of publication bias)
(Region
of missing
studies)
-2 -1 0 1 2
Log Odds Ratio
Asymmetric plot –
•Publication bias
•Clinical heterogeneity
•Methodological heterogeneity
26. Combine the results of larger studies only, which are less likely subject to
publication bias.
File-drawer Method / Fail safe N: How many unpublished studies showing
a null result are required to change a ‘significant’ meta analysis result to a
‘non-significant’ one?
‘Trim and Fill’ method
28. At least two reviewers
Sift and sift again
▪ The first sift – pre-screening - is to decide which studies to retrieve in full.
▪ The second sift – selection - is to look again at these studies and decide which
are to be included in your review
Do not collect outcome data at the same time as eligibility information
▪ wasted time and effort - if study is excluded later on
▪ Results can sway decision
Look out for duplicate publications
29.
30. Create a spreadsheet (Excel, or Open Office Calc)
For each study, create the following columns:
name of the study
name of the author, year published
number of participants who received intervention
number of participants who were in control arm
number who developed outcomes in intervention
number who developed outcomes in control arm
31. 22 studies to do meta analysis
Seven columns created
trial: trial identity code
trialnam: name of trial
year: year of the study
pop1: study population
deaths1: deaths in study
pop0: control population
deaths0: deaths in control
32.
33. Choice of metric :
▪ Original
▪ Standardized mean difference (Mean/Standard Deviation)
Publication bias:
▪ Graphical methods
▪ Quantitative methods
Choice of model/ heterogeneity:
▪ Fixed Effects
▪ Random Effects
34. 35
Data Type Outcome Measures
Continuous Mean
Dichotomous (binary)
(displayed in 2x2 table)
Odds ratio (OR),
Risk ratio (RR),
Risk difference (RD)
35. For continuous outcomes, the mean difference (effect size) is usually used
to compare treatment and control groups
Effect sizes are standardized by the pooled estimate of the (common)
within-group variance
For skewed continuous outcomes,
values may be transformed (e.g. logarithmic), or
the median may be used
36
36. Failure Success
Treatment a b
Control c d
Odds: Treatment: a/b, Control: c/d
Odds Ratio =
ad
bc
a /
b
c d
/
OR < 1 implies treatment effectiveness (protective)
OR > 1 indicative of treatment inferiority (risk)
37. For the purposes of combining, analysis may be presented in
terms of log (OR), i.e. as a difference of log (Odds) of treatment
and control.
Var[log (OR)] =
1 1 1 1
a b c d
If any of the cell-counts is less than 5, use continuity correction
(add 0.5) before calculating OR.
38. a a b
/(
)
c /( c
d
)
RR =
RR is also called the Risk Ratio
It represents the probability of an event (failure) in the treatment
group relative to the probability of the same event in the control
group.
RR is analyzed in log scale.
Var[log(RR)] =
1 1 1 1
a a
b c c d
39. RD =
c
c d
a
a b
RD is the difference of two binomial probabilities, while
RR is the ratio.
Var(RD) =
,
p p
p p
(1 ) (1 ) 1 1 2 2
m
n
where n=a+b, m=c+d, p1= a/n, p2=c/m
40. Failure Success Total
New Treatment 5 95 100
Control 10 90 100
Odds Ratio = (5/95) / (10/90) = 0.48
Risk Ratio = (5/100) / (10/100) = 0.50
(Recall OR RR when probability is small. OR is generally more extreme (further from 1) than RR.)
Risk Difference = (5/100) - (10/100) = -0.05
41.
42. The effect size makes meta-analysis possible
“ratio of the frequency of the events in the intervention to that in the
control group.”
Any standardized index can be an “effect size” (e.g., standardized mean
difference, correlation coefficient, odds-ratio) as long as it –
Is comparable across studies (generally requires standardization)
Represents the magnitude and direction of the relationship of interest
Is independent of sample size
Different meta-analyses may use different effect size indices
47. The graphical display of results from individual studies on a common
scale is a “Forest plot”.
Each study is represented by a black square and a horizontal line
(CI:95%).
The area of the black square reflects the weight of the study / precision
of the study (roughly the sample size).
A logarithmic scale should be used for plotting the Relative Risk /
Odds Ratio.
Aggregate Effect size – displayed as a ‘diamond’.
48.
49. The impact of fish oil consumption on Cardio-vascular diseases
50. Look at the title of the forest plot, the intervention, outcome effect measure
of the investigation and the scale
The names on the left are the authors of the primary studies included in
the MA
The small squares represent the results of the individual trial results
The size of each square represents the weight given to each study in the
meta-analysis
The horizontal lines associated with each square represent the confidence
interval associated with each result
The vertical line represents the line of no effect, i.e. where there is no
statistically significant difference between the treatment/intervention
group and the control group
The pooled analysis is given a diamond shape. The horizontal width of the
diamond is the confidence interval
51. Effect of probiotics on the risk of antibiotic associated diarrhoea
D'Souza, A. L et al. BMJ 2002;324:1361
57. Reviews usually bring together studies that were performed
By different people
In different settings
In different countries
On different people
In different ways
For different lengths of time
To look at different outcomes
........… and these aren’t the only differences.
Assessing combinability
Types of heterogeneity:
•Clinical heterogeneity
•Methodological
heterogeneity
•Statistical heterogeneity
58. Test for existence of heterogeneity: have low power
▪ Cochrane’s Q – statistic based on chi-square test
▪ I2 statistic – scores heterogeneity between 0% and 100%
25% - low heterogeneity
50% - moderate
75% - high
Presence or absence of heterogeneity influences the subsequent method
of analysis:
▪ Fixed- effects model
▪ Random effect model
Meta-regression: to over come heterogeneity
59. FIXED EFFECTS MODEL
• Conduct, if heterogeneity is absent
• Assumes the size of treatment effect be
same (fixed) across all studies &
variation due to chance
• Pooling: Mantel Haenszel OR
• Weight = 1/variance
= 1/SE2
• When heterogeneity exists we get:
• a pooled estimate which may give too
much weight to large studies,
• a confidence interval which is too
narrow,
• a P-value which is too small.
RANDOM EFFECTS
MODEL
• Conduct, if heterogeneity is present
• Assumes that the size of treatment
effect does vary between studies
• Der Simonian Laird method (DSL)
for Odds’ Ratio
• Weight = 1/variance
= 1/(SE2+ inter-trial variance)
• When heterogeneity exists we get:
• possibly a different pooled estimate
with a different interpretation,
• a wider confidence interval,
• a larger P-value
60. FIXED EFFECTS MODEL
• When heterogeneity does not exists:
• a pooled estimate which is correct,
• a confidence interval which is correct,
• a P-value which is correct.
RANDOM EFFECTS
MODEL
• When heterogeneity does not exist:
• a pooled estimate which is correct,
• a confidence interval which is too wide,
• a P-value which is too large
No universally accepted method for choosing.
A reasonable approach:
1. Decide whether the assumption of a fixed effects model is plausible. Could the
studies all be estimating the same effect? If not, consider a random effects model.
2. If fixed effects assumption is plausible, are the data compatible?
Graphical methods: forest plot, Galbraith plot.
Analytical methods: heterogeneity test, I2 statistic.
If assumption looks compatible with the data, use fixed effects, otherwise consider
random effects.
61.
62. The estimate of study results is the dependent variable and
one or more study-level variables are the independent
variables (predictors)
Allows researchers to explore which types of patient-specific
factors or study design factors contribute to heterogeneity.
Limited ability to identify important factors – struggles to
identify which patient features are related to the size of
treatment effect.
63. Involves the central collection, checking and analysis of updated
Individual Patient Data
Include all properly randomised trials, published and unpublished
Include all patients in an intention-to-treat analysis
Analysis stratified by trial
IPD does not mean that all patients are combined into a single mega
trial; meta-analysis looks at the results within each study, and then
calculates a weighted average.
Obtaining individual patient data from each of the trials is
challenging
64. Collect raw data from related studies, whether or not the
studies collaborated at the design stage, exposures measures
and other covariates that can be applied uniformly across the
studies combined.
The major advantage of a IPD over an MA is the use of
individual-based rather than group-based data.
65.
66. Comprehensive search strategy: multiple sources of information
Explicit methodology: to ensure reproducibility and transparency
Emphasis on all clinically important outcomes: related to efficacy, safety,
and tolerability of the interventions under consideration
Limiting errors: two reviewers at all major steps; limits bias and improves
precision
67. 70
Good deal of effort
Qualitative distinctions between studies not captured
“Apples and oranges” criticism
A good meta-analysis of badly designed studies will still result in bad
statistics.
Selection bias
Analysis of between study differences is co-relational
Tend to look at ‘broad questions’ that may not be immediately
applicable to individual patients
Simpson’s paradox (two smaller studies may point in one direction,
and the combination study in the opposite direction)
68. Huge Checklist
[http://faculty.ucmerced.edu/wshadish/]
Free Software:
EpiMeta: from Epi Info
Revman: from Cochrane Collaboration
“meta” package in R for statistical computing
Non-free
meta module in STATA
69. PRISMA Statement (formerly QUOROM) : Preferred Reporting Items
for Systematic Reviews and Meta-Analyses
MOOSE Statement : proposal for reporting meta analyses of
observational studies in epidemiology
70.
71. Mantel-Haenszel methods have been shown to be more reliable when there are not
many data (small trials and not many of them). This is why they have been selected as
the principle method of meta-analysis in the Cochrane Collaboration. This method
(which can be used for OR, RR and RD) is the most appropriate for many Cochrane
reviews, and many Cochrane review groups use it as standard.
Peto method performs well with sparse data and is then the best choice, but when
events are common there is usually no preference to use it over the other methods. It is
not a good idea to use the Peto method when the treatment effect is very large, as the
result may be misleading. This method is also unsuitable if there are large imbalances
in the size of groups within trials.
Random effects model may be better when there is statistical heterogeneity between
the studies in your review (we will discuss this further in Module 13 on Heterogeneity).