1. ANNALS, AAPSS, 578, November 2001
META-ANALYTIC METHODS FOR ACADEMY
THE ANNALS OF THE AMERICANCRIMINOLOGY
Meta-Analytic Methods
for Criminology
By DAVID B. WILSON
ABSTRACT: Meta-analysis was designed to synthesize empirical re-
lationships across studies, such as the effects of a specific crime pre-
vention intervention on criminal offending behavior. Meta-analysis
focuses on the size and direction of effects across studies, examining
the consistency of effects and the relationship between study features
and observed effects. The findings from meta-analysis not only reveal
robust empirical relationships but also identify existing weaknesses
in the knowledge base. Furthermore, meta-analytic results can easily
be translated into summary statistics useful for informing public pol-
icy regarding effective crime prevention efforts.
David B. Wilson is an assistant professor of the administration of justice at George
Mason University. His research interests include program evaluation research method-
ology, meta-analysis, crime and general problem behavior prevention programs, and ju-
venile delinquency intervention effectiveness.
NOTE: This work was supported by the Jerry Lee Foundation.
71
2. 72 THE ANNALS OF THE AMERICAN ACADEMY
I MAGINE you are given the task of
synthesizing what is currently
known about the effectiveness of cor-
typhoid fever (Pearson 1904). His
method involved computing the cor-
relation between inoculation and
rectional boot camps for reducing mortality within each study and then
future criminal behavior among ju- averaging the correlations across
venile and adult offenders. An ex- studies, producing a composite corre-
haustive search for all relevant eval- lation. By today’s standards, this was
uations of boot camp programs a meta-analysis, although the term
compared with more traditional was not introduced until the 1970s
forms of punishment and rehabilita- (Glass 1976).
tion identifies 29 unique studies. The The logical framework of meta-
findings from these studies range analysis is based on the assumption
from large positive to large negative that the averaging of findings across
statistically significant effects. To studies will produce a more valid
complicate matters, the studies vary estimate of the effect of interest
in the evaluation methods used, in- than that of any individual study.
cluding the definition of recidivism Typically, the finding from any indi-
(for example, rearrest, reconviction, vidual study is imprecise due to sam-
and reinstitutionalization), offender pling error. Thus some studies of a
populations, and program character- specific phenomenon, such as the
istics. How will you meaningfully effectiveness of correctional boot
make sense of this array of informa- camps, will overestimate and others
tion? will underestimate the size of the
The statistical methods of meta- true effect. Instability in observed
analysis were designed specifically effects due to sampling error is an
to address this situation. Meta-anal- assumption at the core of statistical
ysis represents a statistical and sys- inference testing, such as a t test
tematic approach to reviewing between an intervention and com-
research findings across multiple parison condition. Averaging across
independent studies. As such, meta- studies is analogous to averaging
analyses are systematic reviews across individuals within a single
(Petrosino et al. 2001 [this issue]). study or averaging across multiple
However, not all criminological inter- test items.
vention research literatures can be For a collection of pure replica-
successfully meta-analyzed, and tions, the logic behind meta-analysis
thus not all systematic reviews will is indisputable if one accepts the
use the statistical methods of meta- logic and assumptions of the stan-
analysis. dard statistical practices of the social
The basic idea behind meta-analy- and medical sciences. Meta-analysis
sis dates back almost 100 years and as it is applied in criminology and the
is simple. Karl Pearson, the devel- other social sciences extends this
oper of the Pearson product-moment logic to collections of studies that are
correlation coefficient, synthesized conceptual replications, that is, stud-
the findings from multiple studies of ies that examine the same relation-
the effectiveness of inoculation for ship of interest but differ from one
3. META-ANALYTIC METHODS FOR CRIMINOLOGY 73
another in other respects, such as the it is objective and systematic, yet
research design or elements of the simple. Furthermore it upholds the
intervention. long-standing tradition in the social
Conceptual replications are assumed sciences of allowing the statistical
to be estimating the same fundamen- significance test to be the arbiter of
tal relationship, despite differences the validity of a scientific hypothesis.
in methodology and other substan- The intuitive appeal of the vote
tive features. This variability in count obscures its weaknesses. First,
study features can be viewed as a the vote count fails to account for the
strength, however, because a synthe- differential precision of the studies
sis of conceptual replications can being reviewed. Larger studies, all
show that a relationship is observed else being equal, provide more pre-
across a range of methodological and cise estimates of the relationship of
substantive variability. Unlike sam- interest and thus should be given
pling error, however, errors in esti- greater weight in a review.
mates of the relationship of interest Second, the vote count fails to rec-
that arise from poor study design will ognize the fundamental asymmetry
not necessarily cancel out as a result of the statistical significance test. A
of aggregation. Therefore the meta- statistically significant finding is a
analyst must carefully assess the strong conclusion, whereas a statisti-
influence of methodological variation cally nonsignificant (null) finding is a
on observed effects (Wilson and weak conclusion. In the vote-count
Lipsey, in press). review, null findings are typically
interpreted as evidence that the rela-
WHY META-ANALYSIS? tionship of interest does not exist (for
example, the intervention is not
Meta-analysis is not the only effective). This is an incorrect inter-
method of synthesizing or reviewing pretation. Failure to reject a null
results across studies. Other ap- hypothesis is not support for the
proaches include the narrative and null, merely suspended judgment.
vote-count review. The narrative Enough null findings in the same
review relies on a researcher’s ability direction are evidence that the null is
to digest the array of findings across false. This possibility was recognized
studies and arrive at a pronounce- by Fisher (1944), a strong proponent
ment regarding the evidence for or of significance testing.
against a hypothesis using some Third, the vote count ignores the
unknown and unknowable (that is, size of the observed effects. By focus-
subjective) mental calculus. ing on statistical significance, and
The vote-count method imposes not the size and direction of the
discipline on this process by tallying effect, a study with a small but statis-
the number of studies with statisti- tically significant effect would be
cally significant findings in favor of viewed as evidence favoring the hy-
the hypothesis and the number con- pothesis, and a study with a large
trary to the hypothesis (null find- nonsignificant effect would be
ings). This approach is appealing, for viewed as evidence against the
4. 74 THE ANNALS OF THE AMERICAN ACADEMY
hypothesis. Both studies provide evi- As a method, meta-analysis
dence that the relationship is non- includes all of the essential features
zero, although the strength of that of a systematic review (see Petrosino
evidence is weak in one of the studies. et al. 2001), including an exhaustive
The benefits of a null hypothesis sta- search for all relevant studies (pub-
tistical significance test for inter- lished or not), explicit inclusion and
preting a finding from an individual exclusion criteria, and a coding pro-
study do not translate into benefits tocol for extracting data from the
when evaluating a collection of studies. The distinctive feature of
related studies. meta-analysis is the application of
Furthermore a counterintuitive statistical techniques to the analysis
feature of the vote-count method is of the study findings, where study
that the likelihood of arriving at an findings are encoded on a common
incorrect conclusion increases as the metric. The section below presents
number of s tudies on a t opi c an overview of the analytic methods
increases, if the typical statistical of meta-analysis. Several articles in
power of the studies in that area is this issue (MacKenzie, Wilson, and
low. This is a common situation in Kider 2001 [this issue]; Lipsey, Chap-
criminology. For example, Lipsey and man, and Landenberger 2001 [this
colleagues (1985) estimated that the issue]) provide examples of meta-
typical power of evaluations of juve- analytic methods. This article con-
nile delinquency interventions was cludes with a discussion of the
less than .50. A vote-count review of strengths and weaknesses of meta-
that literature is sure to yield mis- analysis and guidance on when not to
leading conclusions. use meta-analysis.
Meta-analysis avoids the pitfalls
of the vote-count method by focusing A FRAMEWORK FOR
on the size and direction of effects META-ANALYSIS
across studies, not whether the indi-
vidual effects were statistically sig- A defining feature of meta-analy-
nificant. The latter largely depends sis is the effect size, that is, any index
on the sample size of the study. Fur- of the effect of interest that is compa-
thermore focusing on the size and rable across studies. The effect size
direction of the effect makes better might index the effects of a treat-
use of the data available in the pri- ment group relative to a comparison
mary studies, providing a mecha- group or the relationship between
nism for analyzing differences across two observed variables, such as gen-
studies and drawing inferences der and mathematical achievement
about the likely size of the true popu- or attachment to parents and delin-
lation effect of interest. The statisti- quent behavior. In the analysis of
cal methods of meta-analysis allow meta-analytic data, the effect size is
for an assessment of both the consis- the dependent variable.
tency of findings across studies and The need for an effect size places
the relationship of study features restrictions on what research can be
with variability in effects. meta-analyzed. The collection of
5. META-ANALYTIC METHODS FOR CRIMINOLOGY 75
studies of interest to the reviewer has been argued that the correlation
must examine the same basic rela- coefficient is the ideal effect size
tionship, even if at a broad level of (Rosenthal 1991). However, the stan-
abstraction. At the broad end of the dardized mean difference and odds
continuum would be a group of stud- ratio effect sizes have distinct statis-
ies examining the effects of school- tical advantages over the correlation
based prevention programs on delin- coefficient for intervention research
quent behavior. At the narrow end of and are more natural indices of pro-
the continuum would be a set of repli- gram effects.
cations of a study on the effects of the
drug DepoProvea on the perpetra- Standardized
tion of sexual offenses. The research mean difference
designs of a collection of studies
would all need to be sufficiently simi- The standardized mean differ-
lar such that a comparable effect size ence, d, represents the effect of an
could be computed from each. Thus intervention as the difference
between the intervention and com-
most meta-analyses of intervention
parison group means on the depend-
studies will stipulate that eligible
ent variable of interest, standardized
studies use a comparison group
by the pooled within-groups stan-
design.
dard deviation. Thus findings based
The specific effect size index used on different operationalizations of
in a given meta-analysis will depend the dependent variable of interest
on the nature of the research being (for example, delinquency) are stan-
synthesized. Commonly used effect dardized to a common metric: stan-
size indices for intervention research dard deviation units for the popula-
are the standardized mean differ- tion. An advantage of d is that it can
ence, odds ratio, and correlation coef- be computed from a wide range of
ficient. The standardized mean dif- statistical data, including means and
ference–type effect size is well suited standard deviations, t tests, F tests,
to two group comparison studies (for correlation coefficients, and 2 × 2 con-
example, a treatment versus a com- tingency tables (see Lipsey and Wil-
parison condition) with continuous son 2001). Although conceptualized
or dichotomous dependent measures. as the difference between two groups
The odds ratio is well suited to these on a continuous dependent variable,
same research domains with the d can also be computed from dichoto-
exception that the dependent mea- mous data.
sures must be dichotomous, such as
whether the participants recidivated
Odds ratio
within 12 months of leaving the pro-
gram. The correlation coefficient can The odds ratio, o, represents the
be applied to the broadest range of effect of an intervention as the odds
research designs, including all of a favorable (or unfavorable) out-
designs for which standardized mean come for the intervention group rela-
difference and odds ratio effect sizes tive to the comparison group. It is
can be computed. Because of this, it used when the outcome is measured
6. 76 THE ANNALS OF THE AMERICAN ACADEMY
dichotomously, such as is common in cussion of other alternatives, see
medicine and criminology. The odds Lipsey and Wilson 2001).
ratio is easy to compute from either
the raw frequencies of a 2 × 2 contin- ANALYSIS OF
gency table or the proportions of suc- META-ANALYTIC DATA
cesses or failures in each condition.
As a ratio of two odds, a value of 1 A typical meta-analysis extracts
indicates an equal likelihood of a suc- one or more effect sizes per study and
cessful outcome, whereas values codes a variety of study characteris-
between 1 and 0 indicate a negative tics to represent the important sub-
effect and values greater than 1 indi- stantive and methodological differ-
cate a positive effect. Unlike the cor- ences across studies. Before analysis
relation coefficient, the odds ratio is of the data, statistical transforma-
unaffected by differential base rates tions and adjustments may need to
(the marginal distribution) for the be applied to the effect size. If multi-
outcome acros s s tudi es ( s ee ple effect sizes were extracted per
Farrington and Loeber 2000), thus study, then a method of including
eliminating a potential source of only a single effect size per study (or
effect variability across studies. sample within a study) per analysis
will need to be adopted. The analysis
Correlation coefficient of effect size data typically examines
the central tendency of the effect size
The correlation coefficient is a
distribution and the consistency of
widely used and widely understood
effects across studies. Additional
statistic within the social sciences. It
analyses test for the ability of study
can be used to represent the relation-
features to explain inconsistencies in
ship between two dichotomous vari-
effects across studies. Meta-analytic
ables, a dichotomous and a continu-
methods for performing these analy-
ous variable, and two continuous
ses are summarized below.
variables. The correlation coefficient
has a distinct disadvantage, however,
Transformations
when one or both of the variables on
and adjustments
which it is based are dichotomous
(Farrington and Loeber 2000). For There are standard adjustments
example, the correlation coefficient is and transformations that are rou-
restricted to less than +1 in absolute tinely applied to effect sizes, and
value if the percentage of partici- optional adjustments may be applied
pants in the intervention and com- depending on the purpose of the
parison conditions is not split fifty- meta-analysis. For example, Hedges
fifty. Thus it is recommended that it (1982; Hedges and Olkin 1985)
only be used for meta-analyses of showed that the standardized mean
correlational research and that difference effect size is positively
meta-analyses of intervention stud- biased when based on a small sam-
ies use either the standardized mean ple; that is, it is too large in absolute
difference, the odds ratio, or a more value, and the bias increases as sam-
specialized effect size (for a dis- ple size decreases. The size of bias is
7. META-ANALYTIC METHODS FOR CRIMINOLOGY 77
very modest for all but very small studies, such as reliability and valid-
sample sizes, but the adjustment is ity coefficients. The logic of these
easy to perform and routinely done adjustments is to estimate what
when using d as the effect size index would have been observed under
(for formulas, see the appendix). more ideal research conditions.
When using the odds ratio, one These adjustments, while common in
encounters a complication that is meta-analyses of measurement
also easily rectified. The odds ratio is generalizability studies, are rarely
asymmetric, with negative relation- used in meta-analyses of interven-
ships represented as values between tion research. If they are used, it is
0 and 1 and positive relationships recommended that a sensitivity
represented as values between 1 and analysis be performed to assess the
infinity. This complicates analysis. effect the adjustments have on the
Fortunately, the natural logarithm of results.
the odds ratio is symmetric about 0
with a well-defined standard error. Statistical independence
The importance of the latter is dis- among effect sizes
cussed below. Thus, for purposes of A complication with effect size
analysis, the odds ratio is trans- data is the often numerous effect
formed into the logged odds ratio. sizes of interest available from each
Results can be transformed back into study. Effect sizes that are based on
odds ratios for purposes of interpre- the same sample of individuals (or
tation using the antilogarithm. other units of analysis, such as city
Similarly the correlation coeffi- blocks and so forth) are statistically
cient has a distributional shape that dependent, that is, correlated with
is less than ideal for purposes of com- each other. Meta-analytic analysis
puting averages. Furthermore the assumes that each data point (effect
standard error is asymmetric, partic- size in this case) is statistically inde-
1
ularly as the correlation approaches pendent of all other data points.
–1 or +1. This is easily solved by Thus we can include only one effect
applying Fisher’s Zr transformation, size per sample in any given analysis.
which normalizes the correlation and An independent set of effect sizes can
results in a standard error that is be obtained through several strate-
remarkably simple. As with the odds gies. First, each major outcome con-
ratio, final results can be trans- struct of interest can, and should, be
formed back into correlation coeffi- analyzed separately. For example,
cients for interpretative purposes. effect sizes representing employ-
Hunter and Schmidt (1990) pro- ment success should be analyzed sep-
posed adjusting effect sizes for mea- arately from those representing
surement unreliability and invalid- criminal behavior. Second, multiple
ity, range restriction, and artificial effect sizes within each outcome con-
dichotomization. These adjustments, struct can be averaged to produce one
however, depend on information that effect size per study or sample within
is rarely reported for outcome mea- a study. Alternatively, a meta-ana-
sures in crime and justice evaluation lyst may choose a single effect size
8. 78 THE ANNALS OF THE AMERICAN ACADEMY
based on an explicit criterion. That is, the overall mean effect size, com-
the meta-analyst may prefer rearrest puted as a weighted mean, weighting
data over reinstitutionalization data by the inverse variance weight. A z
if the former are available. Finally, test can be performed to assess
the meta-analyst may randomly whether the mean effect size is sta-
select among those effect sizes that tistically greater than (or less than)
are of interest to a given analysis. 0, and a confidence interval can be
Note that several analyses can be constructed around the mean effect
performed, each with a different set size. Both statistics rely on the stan-
of independent effect sizes. dard error of the mean effect size,
computed from the sum of the
The inverse variance weight weights. Thus both the precision and
number of the individual effect sizes
An additional complication of influence the precision of the mean
meta-analytic data is the differential effect size. (For equations, see the
precision in effect sizes across stud- appendix.)
ies. Effect sizes based on large sam-
The mean effect size is meaningful
ples, all other things being equal, are
only if the effects are consistent
more precise than effect sizes based
across studies, that is, statistically
on small samples. A simple solution
homogeneous. If the effects are
to this problem would be to weight
highly heterogeneous, then a single
each effect size by its sample size.
overall mean effect size does not ade-
Hedges (1982) showed, however, that
qu at el y repres en t t h e ef f ect s
the optimal weight is based on the
observed by the collection of studies.
variance (squared standard error) of
In meta-analysis, consistency in
each effect size. This is intuitively
effects is assessed with the homoge-
appealing as well, for the standard
neity statistic Q. A statistically sig-
error is a statistical expression of the
n i f i can t Q i n di cat es t h at t h e
precision of parameter, such as an
observed variability in effect sizes
effect size. The smaller the standard
exceeds statistical expectations
error, the more precise is the effect
regarding the variability that would
size. Thus, in all meta-analytic anal-
be observed across pure replications,
yses, weights are computed from the
that is, if the collection of studies
inverse of the squared standard error
were indeed estimating a common
of the effect size. This is called the
population effect size. A statistically
inverse variance weight method.
nonsignificant Q suggests that the
Equations for the inverse variance
variability in effects across studies is
weight for each of the three effect size
no greater than expected due to sam-
indices discussed above are pre-
pling error.
sented in the appendix.
A heterogeneous distribution (a
The mean effect size significant Q) is often the desired
and related statistics outcome of a homogeneity analysis.
Heterogeneity justifies the explora-
A starting point for the analysis of tion of the relationship between study
effect size data is the computation of features and effects, an important
9. META-ANALYTIC METHODS FOR CRIMINOLOGY 79
aspect of meta-analysis. The analytic As with the overall distribution,
approaches available to the meta- the residual distribution of effects
analyst for examining between study within categories may be homoge-
effects are an analysis of mean effect neous or heterogeneous. This is
sizes by a categorical study feature, tested with the Q within statistic (see
analogous to a one-way ANOVA, and the appendix). A homogeneous Q
a meta-analytic regression analysis within indicates that the categorical
approach. Both approaches rely on variable explained the excess vari-
inverse variance weighting, and both ability detected by the overall homo-
can be implemented under the geneity test. In this case, the categor-
assumptions of a fixed- or random- ical variable provides an explanation
effects model. The assumptions of for the variability in effects across
these models will be discussed below. studies. Alternatively, additional
sources of variability in effects exist
Categorical analysis if the Q within is significant.
of effect sizes: The The computation of the analog to
analog to the ANOVA the ANOVA can be tedious. Macros
that work with existing statistical
The analog to the ANOVA-type
software packages exist for perform-
analysis is used to examine the rela-
ing this analysis (for example, Lipsey
tionship between a single categorical
and Wilson 2001; Wang and Bush-
variable, such as treatment type or
man 1998). BioStat (2000) has cre-
research method, and effect size.
ated a meta-analysis program that
There may be as few as two catego-
among other features performs the
ries, in which case the analysis is con-
analog to the ANOVA analysis.
ceptually similar to a t test, or many
categories. A separate mean effect
Meta-analytic
size and associated statistics, such as
regression analysis
a z test and confidence interval, are
computed for each category of the The analog to the ANOVA is lim-
variable of interest. To test whether ited to a single categorical variable. A
the mean effect sizes differ across more flexible and general analytic
categories, a Q between groups is cal- strategy for assessing the relation-
culated (see the appendix). Although ship between study features and
this statistic is distributed as a chi- effect size is regression analysis.
square, it is interpreted in the same Regression analysis can incorporate
fashion as an F from a one-way multiple independent variables
ANOVA. A significant Q between (study features) in a single analysis,
groups indicates that the variability including continuous variables and
in the mean effect sizes across cate- categorical variables (via dummy
gories is greater than expected due to coding). The differences between
sampling error. Thus the category is ordinary least squares regression
related to effect size. Examination of and meta-analytic regression are the
confidence intervals provides evi- weighting by the inverse variance
dence of the source of the important and a modification to the standard
difference(s). error of the regression coefficients,
10. 80 THE ANNALS OF THE AMERICAN ACADEMY
necessitating the use of specialized Fixed and random
software (for example, Lipsey and effects models
Wilson 2001; Wang and Bushman
1998). As with the analog to the The statistical model presented
ANOVA, two Q values are calculated above assumes that the collection of
as part of meta-analytic regression: a effect sizes being analyzed is esti-
Q for the model and a Q for the resid- mating a common population effect
ual or error variance. The former is a size. In statistical terms, this is a
test of the predictive ability of the fixed-effects model. Stated differ-
study features in explaining between- ently, a fixed-effects model assumes
studies variability in effects. The that each effect size differs from the
regression model accounts for signifi- true population effect size solely due
cant variability in the effect size dis- to subject-level sampling error. Each
tribution if the Q for the model is sig- observed effect size is viewed as an
nificant. As with the Q within for the imperfect estimate of the true, single
analog to the ANOVA, a significant Q population effect for the intervention
for the error variance indicates that of interest. This provides the theoret-
excess variability remains in the ical basis for incorporating the stan-
effects across studies after account- dard error of the effect size (an esti-
ing for the variability explained by mate of subject-level sampling error)
the regression model. That is, the into the analysis as the inverse vari-
residual distribution in effect sizes is ance weight.
heterogeneous. This assumption is restrictive and
Recognizing the correlational likely to be untenable in many syn-
nature of the above analyses of the theses of criminological intervention
relationship between study features research where studies of a common
and effect size is critical. Study fea- research hypothesis differ on many
tures are often correlated with one dimensions, some of which are likely
another and, as such, a moderating to be related to effect size. Thus each
relationship may be the result of con- effect size has variability (that is,
founded between-studies features. instability) due to subject-level sam-
For example, the mean effect size for pling error and study-level variabil-
treatment type A may be higher than ity. The random-effects model
the mean effect size for treatment assumes that at least some portion of
type B. The studies examining treat- the study-level variability is unex-
ment type B, however, may have used plained by the study features
a less sensitive measure of the out- included in the statistical models of
come construct, thus confounding effect size. These study differences
treatment type with characteristics may simply be unmeasured, or they
of the dependent variable. Multi- may be unmeasurable. In both cases,
variate analyses can help assess the each effect size is assumed to esti-
interrelationships between study mate a true population effect size for
features, but these analyses cannot that study, and the collection of true
account for unmeasured study population effect sizes represents a
characteristics. random distribution of effects. In
11. META-ANALYTIC METHODS FOR CRIMINOLOGY 81
statistical terms, this is a random- effect size per study for any given
effects model. analysis may also affect the meta-
Methods for estimating random- analytic findings. For example, in the
effects models in meta-analysis are boot camp systematic review by Mac-
well developed. The basic method Kenzie, Wilson, and Kider (2001), the
involves modifying the definition of analyses were performed on a single
the inverse variance weight such effect size selected from each study
that it incorporates both the subject- based on a set of decision rules. A sen-
and study-level estimates of instabil- sitivity analysis showed that using a
ity. The inverse variance weight is composite of all recidivism effect
thus based on both the standard sizes produced the same results, bol-
error of the effect size and an esti- stering the authors’ confidence in the
mate of the variability in the distri- findings. Third, if the meta-analysis
bution of population effects. The lat- has included methodologically weak
ter is computed from the observed studies, analyses examining the rela-
distribution of effects. Random- tionship between method features
effects models are more conservative and observed effects are essential.
than fixed-effects models. Confi-
dence intervals will be larger, and Illustration: Cognitive-
regression coefficients that were sta- behavioral programs
tistically significant under a fixed- for sex offenders
effects model may no longer be signif-
To illustrate the methods outlined
icant under a random-effects model.
above, I have selected a subset of
It is recommended that meta-analy-
studies included in a meta-analysis
ses of criminological literatures use a
of sex offender programs (Gallagher,
random-effects model of analysis
Wilson, and MacKenzie no date).
unless a clear justification to do oth-
Presented below are the programs
erwise exists.
based on cognitive-behavioral princi-
ples. Studies were included if they
Sensitivity analysis
used a comparison group design and
A final analytic issue is the sensi- the comparison received either no
tivity of the results to unusual study treatment or non-sex-offender-spe-
effects and decisions made by the cific treatment. Studies also had to
meta-analyst. First, it is wise to report a measure of sex offense recid-
examine the influence of outliers in ivism at some point following termi-
the distribution of effect sizes and nation of the program.
the distribution of inverse variance A total of 13 studies met the eligi-
weights. A modest effect size outlier bility criteria for this meta-analysis.
with a large weight can drive an The recidivism data were dichoto-
analysis. Rerunning an important mous and as such, the odds ratio was
analysis with and without highly selected as the effect size index. The
influential studies can help verify odds ratio and 95 percent confidence
that the observed result is not solely interval for these 13 studies are pre-
a function of a single unusual study. sented in Figure 1. Visual inspection
Second, the method of selecting one of these odds ratios shows a distinct
12. 82 THE ANNALS OF THE AMERICAN ACADEMY
FIGURE 1
ODDS RATIO AND 95 PERCENT CONFIDENCE INTERVAL FOR EACH OF
THE 13 COGNITIVE-BEHAVIORAL SEX OFFENDER EVALUATION STUDIES
Author(s) N Favors Comparison Favors Intervention
Borduin, Henggeler, Blaske & Stein (N = 16)
McGrath, Hoke & Vojtisek (N = 103)
Hildebran & Pithers (N = 90)
Marhsall, Eccles & Barbaree (N = 38)
Studer, Reddon, Roper & Estrada (N = 220)
Nicholaichuk, Gordon, Andre & Gu (N = 579)
Gordon & Nicholaichuk (N = 206)
Guarino & Kimball (N = 75)
Marques, Day, Nelson, & West (N = 229)
Huot (N = 224)
Gordon & Nicholaichuk (N = 1248)
Song & Lieb (N = 278)
Nicholaichuk (N = 65)
Overall Mean Odds-Ratio
.02 .1 .50 1 5 25 200
Odds-Ratio
NOTE: Sources of programs are available from the author.
positive trend, with 12 of the 13 stud- related to study features, Q = 21.99,
ies observing lower recidivism rates df = 12, p < .05.
(and hence odds ratios greater than This collection of studies differed
1) for the sex offender treatment con- in many ways, both in the research
dition than the comparison condi- methods used and the specifics of the
tion. The sole study with a negative sex offender treatment program.
effect (an odds ratio between 0 and 1) Many of these 13 studies evaluated a
had a large confidence interval that cognitive-behavioral approach called
extended well into the positive range relapse prevention. Relapse preven-
and was from a study of poor method- tion programs may be more (or less)
ological quality. effective than other cognitive-behav-
ioral programs. To explore this, the
The weighted mean odds ratio for
mean effect size for relapse preven-
this collection of 13 studies was 2.33, tion and other cognitive-behavioral
and the 95 percent confidence inter- programs was calculated (2.41 and
val was 1.57 to 3.42. The z test indi- 1.73, respectively). Also calculated
cates that this odds ratio was statis- were the Q between and Q within.
tically significant at conventional The Q between was 0.87, p > .05, indi-
levels, z = 4.26, p < .001. This collec- cating that the observed difference
tion of studies supports the conclu- between these two means was not
sion that cognitive-behavioral pro- statistically significant. The Q
grams for sex offenders reduce the within was statistically significant,
risk of a sexual reoffense. The homo- QWITHIN = 21.12, df = 11, p = .03, indi-
geneity statistic was significant, cating that significant variability
indicating that the findings are not acros s g rou ps remai n ed af t er
consistent across studies and may be accounting for treatment type.
13. META-ANALYTIC METHODS FOR CRIMINOLOGY 83
A regression analysis was per- from a practical or clinical perspec-
formed to test whether the differen- tive. That is, is the effect “significant”
tial lengths of follow-up across stud- in the everyday meaning of that
ies and the different definitions of word? Meta-analysts are confronted
recidivism could account for the het- with the same problem. What is the
erogeneity. The regression coefficient practical significance of an observed
for whether the recidivism was mea- mean effect size? A common ap-
sured at least five years posttreat- proach to addressing this problem is
ment was statistically significant the translation of the effect size into
and positive, B = 1.58, p = .01, sug- a success rate differential for the
gesting that studies with longer fol- intervention and comparison condi-
low-up periods observed larger dif- tions, such as using the binomial
ferences in the rates of sexual effect size display (Rosenthal and
offending between the treated and Rubin 1983). For example, a stan-
nontreated groups. The effects of sex dardized mean difference effect size
offender programs may increase over of .40 is equivalent to a success rate
time, or the length of follow-up was differential of 20 percent (that is, 40
related to an unmeasured program percent recidivism in the interven-
characteristic that led to greater tion condition and 60 percent recidi-
effectiveness. The regression coeffi- vism in the comparison condition). If
cient for whether the recidivism mea- the audience for the meta-analysis is
sure was an indicator of arrest or not familiar with standardized mean
reconviction was also statistically difference effect sizes, then the suc-
significant, B = 1.25, p = .04, suggest- cess rate differential provides a use-
ing that arrest may be a more sensi- ful method of understanding the
tive measure of the program effects. practical significance of the observed
Significant variability in the effect findings.
size distribution was accounted for The odds ratio has a natural inter-
by this regression model, QMODEL = pretation without transformation:
7.05, df = 3, p = .03. Furthermore the the odds ratio is the odds of a success-
Q associated with the residual vari- ful outcome in the treated condition
ability in effect sizes was not statisti- relative to the comparison condition.
cally significant, QRESIDUAL = 14.9, df = Thinking about odds is, however, odd
10, p = .13, indicating that the resid-
for all but the more mathematically
ual variability in effects is not
inclined. As with the standardized
greater than would be expected due
mean difference, a mean odds ratio
to sampling error.
can be translated into percentages of
successes (or failures). This transla-
INTERPRETATION OF tion requires “fixing” the failure rate
META-ANALYTIC FINDINGS for one of the conditions. For exam-
ple, if we assume a 50 percent recidi-
A researcher who finds a statisti- vism rate for the comparison condi-
cally significant effect is presented tion, then an odds ratio of 1.5
with the difficult task of deciding translates into a recidivism rate of 40
whether the effect is meaningful percent in the treatment condition.
14. 84 THE ANNALS OF THE AMERICAN ACADEMY
Presenting the results of a meta- applied to a small number of similar
analysis of odds ratios as percent- studies.
ages provides a means of assessing As a practitioner of meta-analysis,
the magnitude of the observed pro- I see few justified disadvantages to
gram effects. the use of meta-analysis. This does
not mean that meta-analysis does
not have its disadvantages. On the
ADVANTAGES AND DISADVANTAGES
OF META-ANALYSIS practical side, meta-analysis is far
more time-consuming than tradi-
Meta-analysis has several distinct tional forms of review and requires a
advantages over alternative forms of moderate level of statistical sophisti-
reviewing empirical research. As a cation. Meta-analysis also simplifies
systematic method of review, meta- the findings of the individual studies,
analysis is replicable by independent often representing each study as a
researchers. The methods are single effect size and a small set of
explicit and open to the scrutiny of descriptor variables. Complex pat-
other scholars, who may question the terns of effects often found in individ-
inclusion and exclusion criteria and ual studies do not lend themselves to
critique the variables used to exam- synthesis, such as the results from
ine between-studies differences. This individual growth-curve modeling.
can lead to productive debates and To accommodate this, a reviewer may
competing analyses of the meta-ana- wish to augment a meta-analytic
lytic data. In addition, meta-analysis review with narrative descriptions of
makes efficient use of the informa- important studies and interesting
tion contained in the primary stud- study-level findings obscured in the
ies. Focusing on the direction and meta-analytic synthesis. Finally, the
magnitude of the findings across methods of meta-analysis cannot
studies using a common statistical overcome weaknesses in the primary
benchmark allows for the explora- studies. If the research base that
tion of relationships between study examines the hypothesis of interest
features of effects that would not oth- is methodologically weak, then the
erwise be observable. The statistical findings from the meta-analysis will
methods of meta-analysis help guard also be weak. In these situations,
against interpreting the dispersion meta-analysis creates a solid founda-
in results as meaningful when it can tion for the next generation of studies
just as easily be explained as sam- by clearly identifying the weak-
pling error. Finally, meta-analysis nesses of the current knowledge base
can handle a much larger number of on a given issue.
studies than could effectively be
summarized with alternative meth- WHEN NOT TO DO META-ANALYSIS
ods. There is no theoretical limit to
the number of studies that can be Meta-analysis is the preferred
incorporated into a single meta-anal- method of systematically reviewing a
ysis, yet as a method it can also be collection of empirical studies
15. META-ANALYTIC METHODS FOR CRIMINOLOGY 85
examining a common research analyzed. Finally, meta-analysis
hypothesis. However, meta-analysis does not address broad theoretical
is not appropriate for the synthesis of issues that may be important to a
all empirical research literatures. debate regarding the value of various
First, meta-analysis cannot be used crime prevention efforts. Meta-anal-
when a common effect size index can- ysis is designed to synthesize the evi-
not be computed across the studies of dence regarding the strength of a
interest. For example, the appropri- relationship across distinct research
ate effect size for area studies (that studies. This is a very specific task
is, studies that have a geographic that may be imbedded in a larger
area as the unit of analysis) is cur- scholarly endeavor.
rently being discussed among mem-
bers of the Campbell Collaboration.
Second, the research designs across a CONCLUSIONS
collection of studies examining the
relationship of interest may be too Systematic reviews approach the
disparate for meaningful synthesis. task of summarizing findings of a col-
For example, studies with different lection of research studies as a
units of analysis cannot be readily research task. As a method of sys-
meta-analyzed unless sufficient data tematic reviewing, meta-analysis
are presented to compute an effect takes this a step further by quantify-
size at a common level of analysis. ing the direction and magnitude of
Studies with fundamentally differ- the findings of interest across studies
ent research designs, such as one- and uses specialized statistical
group longitudinal studies and com- methods to analyze the relationship
parison group studies also should not between findings and study features.
be combined in the same meta-analy- Properly executed, meta-analysis
sis. Third, the research question provides a firm foundation for future
for a meta-analysis may involve research. That is, empirical relation-
a multivariate relatio n s h i p. ships that are well established and
Although methods have been devel- areas that are underresearched or
oped for meta-analyzing multi- that have equivocal findings are
variate research studies (for exam- identified through the meta-analytic
ple, Becker 1992; Becker 1996; process. In addition, meta-analysis
Premack and Hunter 1988), these provides a defensible strategy for
methods have rarely been applied summarizing crime prevention and
and are still not well developed. It is intervention efforts for informing
unlikely that the more elaborate public policy. Although the methods
research designs will ever easily lend are technical, the findings can be
themselves to synthesis. Thus some translated into summary statistics
research questions addressed by pri- readily understandable by non–
mary studies are not easily meta- social science researchers.
16. 86 THE ANNALS OF THE AMERICAN ACADEMY
APPENDIX
EQUATIONS FOR THE CALCULATION OF EFFECT
SIZES AND META-ANALYTIC SUMMARY STATISTICS
No. Equation Notes
Common effect size indices
X1 − X 2
(1) d = Standardized mean difference effect size; X1 is the
s pooled
mean of the intervention condition; X2 is the mean of
the comparison condition; and spooled is the pooled
ad within-groups standard deviation
(2) o = Odds ratio effect size; a and c are the number of
bc
successful outcomes in the intervention and
comparison conditions, and b and d are the number
of failures in the intervention and comparison
conditions (based on a 2 × 2 contingency table)
(3) r = r Correlation coefficient effect size; r is the Pearson
product-moment correlation coefficient between the
two variables of interest
Common transformations of effect size
3
(4) d ′ = 1− d Small sample size bias correction; d is the standardized
4N − 9
mean difference effect size and N is the total sample
size
(5) lor = log(o) Log transformation of the odds ratio
1+ r
(6) z =.5 log Fisher’s transformation of the correlation effect size
1− r
lor
(7) o = e Logged odds ratio (lor) transformed into an odds ratio
e 2 z −1 (o); e is the constant 2.7183
(8) r = 2 z Transforms the effect size z from equation 6 back into a
e +1
correlation; e is the constant 2.7183
Fixed effects model inverse variance weights
n1 + n2 d′ 2
(9) v d = + The variance for the standardized mean difference; n1
n1n2 2(n1 + n2 )
and n2 are the sample sizes for the intervention and
1 1 1 1 comparison conditions
(10) v lor = + + + The variance for the logged odds ratio; a, b, c, and d
a b c d
1 are the cell frequencies of a 2 × 2 contingency table
(11) v z = The variance for the Fisher’s transformed correlation
N −3
1 coefficient; N is the total sample size
(12) w = The inverse variance weight; v is the inverse variance
v
from equation 9, 10, or 11
Mean effect size and related statistics
(13) ES =
∑ (ES ⋅ w ) Weighted mean effect size, where ES is the effect size
∑w index (equations 4, 5, or 6) and w is the inverse
variance weight (equation 12)
17. META-ANALYTIC METHODS FOR CRIMINOLOGY 87
APPENDIX Continued
No. Equation Notes
1
(14) seES = The standard error of the mean effect size
∑w
ES
(15) z = A z test; tests whether ES is statistically greater than or
seES
less than 0
(16) LowerCI = ES – 1.96seES Lower bound of the 95 percent confidence interval
(17) UpperCI = ES + 1.96seES Upper bound of the 95 percent confidence interval
Homogeneity test Q
( ∑ (ES ⋅ w ))
2
(18) Q = ∑ (ES 2
⋅w)− Homogeneity test Q; distributed as a chi-square,
∑w degrees of freedom equals the number of effect
sizes less 1
Random effects variance component and weight
Q − (k − 1)
(19) Vθ = The random effects variance component; the random
∑w2
∑w − ∑w effects variance component has a more complex form
when used as part of the analog to the ANOVA or
1 regression models
(20) w = The random effects inverse variance weight, where v is
v + vθ
defined as in equations 9 through 11
Analog to the ANOVA
2
(ES ⋅ w
∑
j
(21) Q j = ∑ (ES j ⋅ w j ) −
2
j
Q between groups; where j is 1 to the number of
∑w j categories for the independent variable; distributed
as a chi-square with j – 1 degrees of freedom
(22) QW = Q – QB Q within groups; where Q is the overall homogeneity
statistics defined in equation 18 and QB is defined in
equation 21; distributed as a chi-square with the
number of effect sizes minus the number of categories
in the independent variable as the degrees of freedom
Meta-analytic regression analysis
(23) Use specialized software For example, SAS, SPSS, or Stata macros by Lipsey
and Wilson (2001); SAS macros by Wang and
Bushman (1998)
18. 88 THE ANNALS OF THE AMERICAN ACADEMY
Note Larry V. Hedges. New York: Russell
Sage.
1. Methods have been developed for han- Hedges, Larry V. 1982. Estimating Effect
dling dependent effect sizes in a single analy-
Size from a Series of Independent Ex-
sis, but these methods are beyond the scope of
this article. (For details, see Gleser and Olkin
periments. Psychological Bulletin 92:
1994; Kalaian and Raudenbush 1996.) 490-99.
Hedges, Larry V. and Ingram Olkin. 1985.
Statistical Methods for Meta-Analysis.
References Orlando, FL: Academic Press.
Hunter, John E. and Frank L. Schmidt.
Becker, Betsy J. 1992. Models of Science 1990. Methods of Meta-Analysis: Cor-
Achievement: Forces Affecting Perfor- recting Error and Bias in Research
mance in School Science. In Meta- Findings. Newbury Park, CA: Sage.
analysis for Explanation: A Casebook, Kalaian, H. A. and Stephen W. Rauden-
ed. Thomas D. Cook, Harris Cooper, bush. 1996. A Multivariate Mixed Lin-
David S. Cordray, Heidi Hartmann, ear Model For Meta-Analysis. Psycho-
Larry V. Hedges, Richard J. Light, logical Methods 1:227-35.
Thomas A. Louis, and Frederick Lipsey, Mark W., Gabrielle L. Chapman,
Mosteller. New York: Russell Sage. and Nana A. Landenberger. 2001. Cog-
Becker, G. 1996. The Meta-Aanalysis of nitive-Behavioral Programs for Of-
Factor Analyses: An Illustration fenders. Annals of the American Acad-
Based on the Cumulation of Correla- emy of Political and Social Science
tion Matrices. Psychological Methods 578:144-157.
1:341-53. Lipsey, Mark W., Scott Crosse, J. Dunkle,
BioStat. 2000. Comprehensive Meta- J. Pollard, and G. Stobart. 1985. Evalu-
Analysis (Software Program, Version ation: The State of the Art and the
1.0.9). Englewood, NJ: BioStat. Avail- Sorry State of the Science. New Direc-
able: www.metaanalysis.com. tions for Program Evaluation 27:7-28.
Farrington, David P. and Rolf Loeber. Lipsey, Mark W. and David B. Wilson.
2000. Some Benefits of Dichot- 2001. Practical Meta-Analysis. Thou-
omization in Psychiatric and Crimino- sand Oaks, CA: Sage.
logical Research. Criminal Behaviour MacKenzie, Doris Layton, David B. Wil-
and Mental Health 10:100-122. son, and Suzanne B. Kider. 2001. Ef-
Fisher, Ronald A. 1944. Statistical fects of Correctional Boot Camps on
Methods for Research Workers. 9th ed. Offending. Annals of the American
London: Oliver and Boyd. Academy of Political and Social Sci-
Gallagher, Catherine A., David B. Wilson, ence 578:126-143.
and Doris Layton MacKenzie. N.d. A Pearson, Karl. 1904. Report on Certain
Meta-Analysis of the Effectiveness of Enteric Fever Inoculation Statistics.
Sexual Offender Treatment Pro- British Medical Journal 3:1243-46.
grams. Unpublished manuscript, Uni- Quoted in Morton Hunt, How Science
versity of Maryland at College Park. Takes Stock: The Story of Meta-
Glass, Gene V. 1976. Primary, Secondary Analysis (New York: Russell Sage,
and Meta-Analysis of Research. Edu- 1997).
cational Researcher 5:3-8. Petrosino, Anthony, Robert F. Boruch,
Gleser, Leon J. and Ingram Olkin. 1994. Haluk Soydan, Lorna Duggan, and
Stochastically Dependent Effect Julio Sanchez-Meca. 2001. Meeting
Sizes. In The Handbook of Research the Challenges of Evidence-Based
Synthesis, ed. Harris Cooper and Policy: The Campbell Collaboration.
19. META-ANALYTIC METHODS FOR CRIMINOLOGY 89
Annals of the American Academy of fect. Journal of Educational Psychol-
Political and Social Science 578:14-34. ogy 74:166-69.
Premack, Steven L. and John E. Hunter. Wang, Morgan C. and Brad J. Bushman.
1988. Individual Unionization De- 1998. Integrating Results Through
cisions. Psychological Bulletin 103: Meta-Analytic Review Using SAS
223-34. Software. Cary, NC: SAS Institute.
Rosenthal, Robert. 1991. Meta-Analytic Wilson, David B. and Mark W. Lipsey. In
Procedures for Social Research. Ap- press. The Role of Method in Treat-
plied Social Research Methods Series. ment Effect Estimates: Evidence from
Vol. 6. Newbury Park, CA: Sage. Psychological, Behavioral, and Educa-
Rosenthal, Robert and Donald B. Rubin. tional Treatment Intervention Meta-
1983. A Simple, General Purpose Dis- Analyses. Psychological Methods.
play of Magnitude of Experimental Ef-