SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Public/Private Ventures
                                                Brief




Evaluating Mentoring
            Programs




          Jean Baldwin Grossman

                 September 2009
Public/Private Ventures   Board of Directors                     Research Advisory
                        is a national leader in
                        creating and strength-
                                                                                         Committee
                        ening programs that       Matthew T. McGuire, Chair              Jacquelynne S. Eccles, Chair
improve lives in low-income communities. We           Principal                              University of Michigan
do this in three ways:                                  Origami Capital Partners, LLC    Robert Granger
                                                  Yvonne Chan                                William T. Grant Foundation
innovation                                            Principal                          Robinson Hollister
We work with leaders in the field to identify           Vaughn Learning Center               Swarthmore College
promising existing programs or develop new        The Honorable Renée                    Reed Larson
ones.                                                 Cardwell Hughes                        University of Illinois
                                                      Judge, Court of Common Pleas
                                                                                         Jean E. Rhodes
research                                                The First Judicial District,
                                                                                             University of Massachusetts,
We rigorously evaluate these programs to                Philadelphia, PA
                                                                                               Boston
determine what is effective and what is not.      Christine L. James-Brown
                                                                                         Thomas Weisner
                                                      President and CEO
                                                                                             UCLA
action                                                  Child Welfare
We reproduce model programs in new                      League of America
locations, provide technical assistance           Robert J. LaLonde
                                                      Professor
where needed and inform policymakers and
                                                        The University of Chicago
practitioners about what works.
                                                  John A. Mayer, Jr.
P/PV is a 501(c)(3) nonprofit, nonpartisan            Retired, Chief Financial Officer
                                                        J. P. Morgan & Co.
organization with offices in Philadelphia, New
                                                  Anne Hodges Morgan
York City and Oakland. For more information,
                                                      Consultant to Foundations
please visit www.ppv.org.
                                                  Siobhan Nicolau
                                                      President
                                                        Hispanic Policy
                                                        Development Project
                                                  Marion Pines
                                                      Senior Fellow
                                                        Institute for Policy Studies
                                                        Johns Hopkins University
                                                  Clayton S. Rose
                                                      Senior Lecturer
                                                        Harvard Business School
                                                  Cay Stratton
                                                      Special Adviser
                                                        UK Commission for
                                                        Employment and Skills
                                                  Sudhir Venkatesh
                                                      William B. Ransford
                                                        Professor of Sociology
                                                        Columbia University
                                                  William Julius Wilson
                                                      Lewis P. and Linda L.
                                                        Geyser University Professor
                                                        Harvard University




2
Acknowledgments




This brief is a revised version of a chapter written for the Handbook of Youth Mentoring edited by
David DuBois and Michael Karcher (2005). David and Michael provided many useful comments
on the earlier version. Laura Johnson and Chelsea Farley of Public/Private Ventures helped
revise the chapter to make it more accessible to non-mentoring specialists and provided great
editorial advice.

Additional reference: DuBois, D. L. and M. J. Karcher, eds. 2005. Handbook of Youth Mentoring.
Thousand Oaks, CA: Sage Publications, Inc.




3
Introduction




Questions about mentoring abound.                  This article presents discussions of many issues
Mentoring programs around the country              that arise in answering both implementation
are being asked by their funders and boards,       or process questions and impact questions.
“Does this mentoring program work?”                Process questions are important to address
Policymakers ask, “Does this particular type       even if a researcher is interested only in
of mentoring—be it school-based or group or        impacts, because one should not ask, “Does it
email—work?” These are questions about pro-        work?” unless “it” actually occurred. The first
gram impacts. Researchers and operators also       section covers how one chooses appropriate
want to know about the program’s processes:        process and impact measures. The next sec-
What about mentoring makes it work? How            tion discusses several impact design issues,
long should a match last to be effective? How      including the inadequacies of simple pre/
frequently should matches meet? Does the           post designs, the importance of a good com-
level of training, support or supervision of       parison group and several ways to construct
the match matter? Does parental involvement        comparison groups. The last section discusses
or communication matter? What types of             common mistakes made when analyzing
interactions between youth and mentors lead        evaluation data and presents ways to avoid
to positive changes in the child? Then there       them. For a more complete discussion of
are questions about the populations served         evaluation in general, readers are referred to
and what practices are most effective: Are par-    Rossi et al. (1999); Shadish et al. (2002); and
ticular types of youth more affected by men-       Weiss (1998). Due to space constraints, issues
toring than others? Are mentors with specific      entailed in answering mediational questions
characteristics, such as being older or more       are not addressed here.
educated, more effective than other mentors
or more effective with particular subgroups
of youth? Finally, researchers in particular are
interested in the theoretical underpinning
of mentoring. For example, to what degree
does mentoring work by changing children’s
beliefs about themselves (such as boosting
self-esteem or self-efficacy), by shaping their
values (such as their views about education
and the future) or by improving their social
and/or cognitive skills?




4
Measurement Issues




A useful guide in deciding what to measure is     mentoring programs have more detailed
a program’s logic model or theory of change:      ideas, such as wanting participants to experi-
the set of hypothesized links between the pro-    ence specific program elements (academic
gram’s action, participants’ response and the     support, for example, or peer interaction). If
desired outcomes. As Weiss states, with such      these are critical components of the program
a theory in hand, “The evaluation can trace       theory, they also make good candidates for
the unfolding of the assumptions” (1998, 58).     process measures.
Rhodes et al. (2005) presents one possible
theory of change for mentoring: Process mea-      A second level of process question concerns
sures describe the program’s actions; outcome     the quality of the components: How good
measures describe what effects the program has.   are the relationships? Are the training and
                                                  supervision useful? These are more difficult
                                                  dimensions to measure. Client satisfaction
Process Measures
                                                  measures, such as how much youth like their
The first question when examining a pro-          mentors or how useful the mentors feel the
gram is: What exactly is the program as           training is, are one gauge of quality. However,
experienced by participants? The effect the       clients’ assessment of quality may not be
program will have on participants depends         accurate; as many teachers say, the most enjoy-
on the realities of the program, not on its       able class may not be the class that promotes
official description. All too frequently in       the most learning. Testing mentors before
mentoring programs, relatively few strong         and after training is an alternative quality
relationships form and matched pairs stop         measure. Assessing the quality of mentoring
meeting. Process questions can be answered,       relationships is a relatively unexplored area.
however, at several levels. Most basically,       Grossman and Johnson (1999) and Rhodes et
one wants to know: Did the program recruit        al. (2005) propose some measures.
appropriate youth and adults? Did adults and
youth meet as planned? Did all the compo-         From a program operator’s or funder’s per-
nents of the program happen? Were mentors         spective, how much process information
trained and supervised as expected?               is “enough” depends on striking a balance
                                                  between knowing exactly what is happening
To address these questions, one examines          in the program versus recognizing the service
the characteristics and experiences of the        the staff could have provided in lieu of collect-
participants, mentors and the match, and          ing data. Researchers should assess enough
compares them with the program’s expecta-         implementation data to be sure the program
tions. For example, a mentoring program           is actually delivering the services it purports to
targeting youth involved in criminal or vio-      offer at a level and quality consistent with hav-
lent activity tracked the number of arrests of    ing a detectable impact before spending the
new participants to determine whether they        time and money to collect data on outcomes.
were serving their desired target populations     Even if no impact is expected, it is essential to
(Branch 2002). A high school mentoring            know exactly what did or did not happen to
program for struggling students tracked the       the participants to understand one’s findings.
GPAs of enrolled youth (Grossman, Johnson         Thus, researchers may want to collect more
1999). Two match characteristics commonly         process data than typically would be collected
examined are the average completed length         by operators to improve both the quality of
of the relationship and the average frequency     their generalizations and their ability to link
of interaction. Like all good process mea-        impacts to variation in participants’ experi-
sures, they relate to the program’s theory. To    ences of core elements of the program.
be affected, a participant must experience a
sufficient dosage of the intervention. Some

5
Lesson: Tracking process measures is impor-          the false impression that the program is a fail-
tant to program managers but essential for           ure, when in fact the impacts on the chosen
evaluators. Before embarking on an evalua-           variables may not yet have emerged.
tion of impacts, be sure the program is deliv-
ering its services at a quality and intensity that   A good technique for selecting variables is
would lead one to expect impacts.                    to choose a range of proximal to more distal
                                                     expected impacts based on the program’s the-
                                                     ory of change, which also represents a set of
Outcome Measures
                                                     impacts ranging from modestly to impressively
An early task for an impact evaluator is to          effective (Weiss 1998). Unfortunately, one can-
refine the “Does it work?” question into a           not know a priori how long matches will last
set of testable evaluation questions. These          or how often the individuals will meet. Thus,
questions need to specify a set of outcome           it is wise to include some outcomes that are
variables that will be examined during the           likely to change even with rather limited expo-
evaluation. There are two criteria for a good        sure to the intervention, and some outcomes
outcome measure (Rossi et al. 1999). First,          that would change with greater exposure,
the outcome can be realistically expected to         thus setting multiple “bars.” The most basic
change during the study period given the             effectiveness goal is an outcome that everyone
intensity of the intervention. Second, the out-      agrees should be achievable. From there, one
come is measurable and the chosen measure            can identify more ambitious outcomes.
sensitive enough to detect the likely change.
                                                     Public/Private Ventures’ evaluation of Big
Evaluation questions are not program goals.          Brothers Big Sisters (BBBS) provides a good
Many programs rightly have lofty inspira-            example of this process (Grossman and
tional goals, such as enabling all participants      Tierney 1998). Researchers conducted a thor-
to excel academically or to become self-             ough review of BBBS’s manual of standards
sufficient, responsible citizens. However, a         and practices to understand the program’s
good evaluation outcome must be concrete,            logic model and then, by working closely with
measurable and likely to change enough dur-          staff from the national office and local agen-
ing the study period to be detected. Thus, for       cies, generated multiple outcome bars. The
example, achieving a goal like “helping youth        national manual lists four “common” goals
academically excel” could be gauged by exam-         for a Little Brother or Little Sister: providing
ining students’ grades or test scores.               social, cultural and recreational enrichment;
                                                     improving peer relationships; improving self-
In addition, when choosing the specific set of       concept; and improving motivation, attitude
outcomes that will indicate a goal such as “aca-     and achievement related to schoolwork.
demically excelling,” one must consider which        Conversations with BBBS staff also suggested
of the possible variables are likely to change       that having a Big Brother or Big Sister could
given the program dosage participants will           reduce the incidence of antisocial behav-
probably receive during the evaluation period.       iors such as drug and alcohol use and could
For example, researchers often have found            improve a Little Brother’s or Little Sister’s
that reading and math achievement test scores        relationship with his or her parent(s). Using
change less quickly than do reading or math          previous research, the hypothesized impacts
grades, which, in turn, change less quickly          were ordered from proximal to distal as fol-
than school effort. Thus, if one is evaluating       lows: increased opportunities for social and
the school-year (i.e., nine months) impact of        cultural enrichment, improved self-concept,
a school-based mentoring program, one is             better relationships with family and friends,
likely to want to examine effort and grades          improved academic outcomes and reduced
rather than test scores, or at least in addition     antisocial behavior.
to test scores. Considerable care and thought
need to go into deciding what outcomes data          At a minimum, the mentoring experience was
should be collected and when. Examining              expected to enrich the cultural and social life
impacts on outcomes that are unlikely to             of youth, even though many more impacts
change during the evaluation period can give         were anticipated. Because motivational psy-
                                                     chology research shows that attitudes often
                                                     change before behaviors, the next set of

6
outcomes reflected attitudinal changes toward        Information from each source has advantages
themselves and others. The “harder” academic         and disadvantages. For example, for some
and antisocial outcomes then were specified.         variables, such as attitudes or beliefs, the youth
Within these outcomes, researchers also              may be the only individual who can provide
hypothesized a range of impacts, from attitu-        valid information. Youth, for example, arguably
dinal variables, such as the child’s perceived       are uniquely qualified to report on constructs
sense of academic efficacy and value placed          such as their self-esteem (outcome measures)
on education, to some intermediate behav-            or considerations such as how much they like
ioral changes, such as school attendance and         their mentors or whether they think their men-
being sent to the principal’s office, to changes     tors support and care for them (process mea-
in grades, drug and alcohol use, and fighting.       sures). Theoretically, what may be important is
                                                     not what support the mentor actually gives but
Once outcomes are identified, the next ques-         how supportive the youth perceives the mentor
tion is how to measure them. Two of the most         to be (DuBois et al. 2002).
important criteria for choosing a measure are
whether the measure captures the exact facet         On the other hand, youth-reported data may
of the outcome that the program is expected          be biased. First, youth may be more likely to
to affect and whether it is sensitive enough to      give socially desirable answers—recounting
pick up small changes. For example, an aca-          higher grades or less antisocial behavior.
demically focused mentoring program that             If this bias is different for mentored versus
claims to increase the self-esteem of youth may      nonmentored youth, impact estimates based
help youth feel more academically competent          on these variables could be biased. Second,
but not improve their general feelings of self-      the feelings of youth toward their mentors
worth. Thus, one would want to use a scale           may taint their reporting. For example, if the
targeting academic self-worth or competence          youth does not like the mentor’s style, he or
rather than a global self-worth scale—or select      she may selectively report or overreport cer-
a scale that can measure both. The second            tain negative experiences, such as the mentor
consideration is the measure’s degree of sen-        missing meetings, and underreport others of
sitivity. Some measures are extremely good at        a more positive nature, such as the amount of
sorting a population or identifying a subgroup       time the mentor spends providing help with
in need of help but poor in detecting the small      schoolwork. Similarly, the youth may overstate
changes that typically result from programs.         a mentor’s performance to make the mentor
For example, in this author’s experience, the        look good. Last, the younger the child is, the
Rosenberg self-esteem scale (1979) is useful in      less reliable or subtle the self-report. For this
distinguishing adolescents with high and low         reason, when participants are quite young (8
self-esteem but often is not sensitive enough to     or 9 years old), it is advisable to collect infor-
detect the small changes in self-esteem induced      mation from their parents and/or teachers.
by most youth programs. On the other hand,
measures of academic or social competency            The mentor often can be a good source of
beliefs (Eccles et al. 1984) can detect relatively   information about what the mentoring expe-
small changes.                                       rience is like, such as what the mentor and
                                                     mentee do and talk about (process measures),
Lesson: Choose outcomes that are integrally          and as a reporter on the child’s behaviors at
linked to the program’s theory of change, that       posttest (outcome measures). The main prob-
establish multiple “effectiveness bars,” that are    lem with mentor reporting is that mentors
gauged with sensitive measures and that can be       have an incentive to report positively on their
achieved within the evaluation’s time frame and      relationships with youth and to see effects
in the context of the program’s implementation.      even if there are none, justifying why they are
                                                     spending time with the child. Although there
                                                     may be a positive bias, this does not preclude
Choosing Informants
                                                     mentors’ being accurate in reporting rela-
Another issue to be resolved for either process      tive impacts. This is because most mentors do
or outcome measures is from whom to collect          not report that their mentees have improved
information. For mentoring programs, the             equally in all areas. The pattern of differ-
candidates are usually the youth, the mentor,        ence in these reports, especially if consistent
a parent, teachers and school records.

7
with those obtained from other sources, such
as school records, may provide useful infor-
mation about the true impacts.

Parents also can be useful as reporters. They
may notice that the child is trying harder in
school, for example, even though the child
might not notice the change. However, like
the mentor, parents may project changes that
they wish were happening or be unaware of
certain behaviors (e.g., substance use).

Finally, teachers may be good reporters on
the behaviors of their students during the
school day. Teachers who are familiar with
age-appropriate behavior, for example, may
spot a problem when a parent or mentor does
not. However, teachers are extraordinarily
busy, and it can be difficult for them to find
the time to fill out evaluation forms on the
participants. In addition, teachers too are not
immune to seeing what they want to see, and
as with mentors and parents, the caveat about
relative impacts applies here.

Information also can be collected from
records. Data about the occurrence of spe-
cific events—fights, cut classes, principal
visits—are less susceptible to bias, unless the
sources of these data (e.g., school adminis-
trators making discipline decisions) differ-
entially judge or report events for mentored
youth versus other youth.

Lesson: Each respondent has a unique point
of view, but all are susceptible to reporting
what they wish had happened. Thus, if time
and money allow, it is advantageous to exam-
ine multiple perspectives on an outcome and
triangulate on the impacts. What is important
is to see a consistent pattern of impacts (not
uniform consistency among the respondents).
The more consistency there is, the more
certain one can be that a particular impact
occurred. For example, if the youth, parent
and teacher data all indicate school improve-
ment and test scores also increase, this would
be particularly strong evidence of academic
gains. Conversely, if only one of these mea-
sures exhibits change (e.g., parent reports), it
could be just a spurious finding.




8
Design Issues




Answering the questions “Does mentoring               in the absence of the program, one must
work?” and “For whom?” may seem relatively            identify another group of youth, namely a
straightforward—achievable simply by observ-          comparison group, whose behavior will rep-
ing the changes in mentees’ outcomes. But             resent what the participants’ behavior would
these ostensibly simple questions are harder          have been without the program. Choosing
to answer than one might assume.                      a group whose behavior accurately depicts
                                                      this hypothetical no-treatment (or coun-
                                                      terfactual) state is the crux of getting the
The Fallacy of Pre/Post Comparisons
                                                      right answer to the effectiveness question,
The changes we observe in the attitudes,              because a program’s impacts are ascertained
behaviors or skills of youth while they are           by comparing the behavior of the treatment
being mentored are not equivalent to program          or participant group with that of the selected
impacts. How can that be? The answer has to           comparison group.
do with what statisticians call internal valid-
ity. Consider, again, the previously described
BBBS evaluation. If one looks only at changes         Matched Comparison or Control Group
in outcomes for treatment youth (Grossman,            Construction
Johnson 1999), one finds that 18 months after         There are two principal types of comparison
they applied to the program, 7 percent had            groups: control groups generated through
reported starting to use drugs. On the face of        random assignment and matched comparison
it, it appears that the program was ineffective;      groups selected judgmentally by the researcher.
however, during the same period, 11 percent of
the controls had reported starting to use drugs.
                                                      Experimental Control Groups
Thus, rather than being ineffective, this statisti-
cally significant difference indicates that BBBS      Random assignment is the best way to create
was able to stem some of the naturally occur-         two groups that would change comparably
ring increases in drug use.                           over time. In this type of evaluation, eligible
                                                      individuals are assigned randomly, either to
The critical distinction here is the difference       the control group and not allowed into the
between outcomes and impacts. In evalua-              program, or to the treatment group, whose
tion, an outcome is the value of any variable         members are offered the program. (Note:
                                                      “Treatments” and “controls” refer to randomly
measured after the intervention, such as
                                                      selected groups of individuals. Not all treat-
grades. An impact is the difference between
                                                      ments may choose to participate. The term
the outcome observed and what it would
                                                      “participants” is used to refer to individuals
have been in the absence of the program
                                                      who actually receive the program.)
(Rossi et al. 1999); in other words, it is the
change in the outcome that was caused by
                                                      The principal advantage of random assign-
the program. Simple changes in outcomes
                                                      ment is that given large enough groups,
may in part reflect the program’s impact
                                                      on average, the two groups are statistically
but also might reflect other factors, such as
                                                      equivalent with respect to all characteristics,
changes due to maturation.
                                                      observed and unobserved, at the time the
                                                      two groups are formed. If nothing were done
Lesson: A program’s impact can be gauged
                                                      to either group, their behaviors, on average,
accurately (i.e., be internally valid) only if
                                                      would continue to be statistically equivalent
one knows what would have happened to
                                                      at any point in the future. Thus, if after the
the participants had they not been in the
                                                      intervention the average behavior of the two
program. This hypothetical state is called
                                                      groups differs, the difference can be confi-
the “counterfactual.” Because one cannot
                                                      dently and causally linked to the program,
observe what the mentees would have done

9
which was the only systematic difference           Matched (or Quasi-Experimental)
between the two groups. See Orr (1999) for a       Comparison Groups
discussion of how large each group should be.      Random assignment is not always possible. For
                                                   example, programs may be too small or staff
Although random assignment affords the             may refuse to participate in such an evalu-
most scientifically reliable way of creating two   ation. When this is the case, researchers must
comparable groups, there are many issues that      identify a group of nonparticipant youth whose
should be considered before using it. Two of       outcomes credibly represent what would have
the most difficult are “Can random assign-         happened to the participants in the absence of
ment be inserted into the program’s normal         the program. The weakness of the methodol-
process without qualitatively changing the pro-    ogy is that the outcomes of the two groups can
gram?” and “Is it ethical to deny certain youth    differ not only because one group got a men-
a mentor?” However, it is worth noting that        tor and the other did not but also because of
all programs ration their services, primarily      other differences between the groups. To gen-
by not advertising to more people than they        erate internally valid estimates of the program’s
can serve. Random assignment gives all needy       impacts, one must control for the “other dif-
children an equal probability of being served,     ferences” either through statistical procedures
rather than denying children who need a            such as regression analysis and/or through
mentor by not telling them about the pro-          careful matching.
gram. The reader is referred to Dennis (1994)
for a detailed discussion of the ethical issues    The researcher selects a comparison group
involved in random assignment.                     of youth who are as similar as possible to the
                                                   participant group across all the important
With respect to the first issue, consider first    characteristics that may influence outcomes
how the insertion of random assignment into        in the counterfactual state (the hypotheti-
the intake process affects the program. One        cal no-treatment state). Some key charac-
of the misconceptions about random assign-         teristics are relatively easy to identify and
ment among mentoring staff is that it means        match for (e.g., age, race, gender or family
randomly pairing youth with adults. This is not    structure). However, to improve the cred-
the case. Random pairing would fundamentally       ibility of a matched comparison group, one
change the program, and any evaluation of          needs to think deeply about other potential
this altered program would not provide infor-      differences that could affect the outcome dif-
mation on the effect of the actual program. A      ferential, such as whether one group of youth
valid use of random assignment would entail        comes from families that care enough and
randomly dividing eligible applicants between      are competent enough to search out services
the treatment and control groups, then pro-        for their youth, or how comfortable the youth
cessing the treatment group youth just as they     are with adults. These critical yet hard-to-
normally would be handled and matched.             measure variables are factors that are likely to
Under this design, random assignment affects       systematically differ between participant and
only which youth files come across the staff’s     comparison group youth and to substantially
desk for matching, not what happens to youth       affect one or more of the outcomes being
once they are there. Another valid test would      examined. The more readers of an evaluation
involve identifying two youth for every volun-     can think of such variables that have not been
teer, then randomly assigning one child to the     accounted for, the less they will believe the
treatment group and one to the control group.      resulting program impact estimates.
For the BBBS evaluation, we used the former
method because it was significantly less bur-      Consider, for example, an email mentor-
densome and emotionally more acceptable for        ing program. Not only would one want the
the staff. However, the chosen design meant        comparison group to match the participant
that not all treatment youth actually received     group on demographic characteristics—age
a mentor. As will be discussed later, only         (say, 12, 13, 14 or 15 years old), gender (male,
about three quarters of the youth who were         female), race (white, Hispanic, black) and
randomized into the treatment group and            income (poor, nonpoor)—but one might
offered the program actually received men-         also want to match the two groups on their
tors. (See Orr 1999 for a rich discussion of       preprogram use of the computer, such as
all aspects of random assignment.)                 the average number of hours per week spent

10
using email or playing computer games. To              probabilities are calculated for both par-
match on this variable, however, one would             ticipants and all potential nonparticipants.
have to collect computer use data on many              Each participant then is matched with one or
nonparticipant youth to find those most com-           more nonparticipant youth based on these
parable to the participants.                           predicted propensity scores. For example, for
                                                       each participant, the nonparticipant with the
When one has more than a few matching vari-            closest predicted participation probability can
ables, the number of cells becomes too numer-          be selected into the comparison group. (See
ous. In the above example, we would have               Shadish et al. 2002, 161–165, for further dis-
4 age × 2 gender × 3 race × 2 income, or 48            cussion of PSM, and Dynarski et al. 2003 for
cells, even before splitting by computer use. A        an application in a school-based setting.)
method that is used with increasing frequency
to address this issue is propensity score match-       An implication of this technique is that one
ing (PSM). A propensity score is the probability       needs data for the propensity score logit from
of being a participant given a set of known fac-       both the participant group and a large pool of
tors. In simple random assignment evaluations,         nonparticipant youth who will be considered
the propensity score of every sample member is         for inclusion in the comparison group. The
50 percent, regardless of his or her characteris-      larger the considered nonparticipant pool is,
tics. In the real world, without random assign-        the more likely it is that one can find a close
ment, the probability of being a participant           propensity score match for each participant.
depends on the individual’s characteristics,           This data requirement often pushes research-
such as his or her comfort with computers in           ers to select matching factors that are readily
the example above. Thus, participants and              available through records rather than incur
nonparticipants naturally differ with regard to        the expense of collecting new data.
many characteristics. PSM can help researchers
select which nonparticipants best match the            One weakness of this method is that although
participant group with respect to a weighted           the propensity to participate will be quite simi-
average of all these characteristics (where the        lar for the participant and comparison groups,
weights reflect how important the factors are in       the percentage with a particular characteris-
making the individual a participant).                  tic (such as male) may not be, because PSM
                                                       matches on a linear combination of character-
To calculate these weights, the researcher             istics, not each characteristic one by one. To
estimates, across both the participant and             overcome this weakness, most studies match
nonparticipant samples, a logistic model of            propensity scores within a few demographi-
the probability of being a participant (Pi)            cally defined cells (such as race/gender).
as a function of the matching variables and
all other factors that are hypothesized to be          PSM also balances the two groups only on the
related to participation (Rosenbaum, Rubin             factors that went into the propensity score
1983; Rubin 1997). For example, if one were            regression. For example, the PSM in Dynarski
evaluating a school-based mentoring program,           et al. (2003) was based on data gathered from
the equation might include age, gender, race,          21,000 students to generate a comparison
household status (HH) and reduced-price-               group for their approximately 2,500 partici-
lunch status (RL), as well as past academic            pants. However, when data were collected later
(GPA) and behavior (BEH) assessments, as is            on parents, it turned out that comparison
shown in Equation 1 below. Obtaining teacher           group students were from higher-income fami-
ratings of the youth’s interpersonal skills            lies. No matter how carefully a comparison
(SOC) also would help match on the youth’s             group is constructed, one can never know for
ability to form a relationship.                        sure how similar this group is to the participant
                                                       group on unmeasured characteristics, such as
(1) Pi = f(age, gender, race, HH, RL, GPA, BEH, SOC)   their ability to respond to adult guidance.

The next step of PSM is to compute for each
potential member of the sample the prob-
ability of participation based on the matching
characteristics in the regression. Predicted


11
Lesson: How much a reader trusts the internal
validity of an evaluation depends on how much
he or she trusts that the comparison group
truly is similar to the participant group on all
important dimensions. This level of trust or
confidence is quantifiable in random assign-
ment designs (e.g., one is 95 percent confident
that the two groups are statistically equivalent),
whereas with a quasi-experimental design, this
level of trust is uncertain and unquantifiable.




12
Analysis




This section covers how impact estimates are        (unmeasured factors). Another way to think
derived, from the simplest techniques to more       of b is that it is basically the difference in the
statistically sophisticated ones. Several com-      mean Ys, adjusting for differences in Xs.
monly committed errors and techniques used
to overcome these problems are presented.           When data are from a quasi-experimental
                                                    evaluation, it is always best to estimate impacts
                                                    using regression or analysis of covariance;
The Basics of Impact Estimates
                                                    not only does one get more precise estimates,
Impact estimates for both experimental and          but one can control for any differences that
quasi-experimental evaluation are basically         do arise between the participant and the
determined by contrasting the outcomes of           comparison groups. Regression simulates
the participant or treatment group with those       what outcomes youth who were exactly like
of the control or comparison group. If one          participants on all the included characteris-
has data from a random assignment design,           tics (the Xs) would have had if they had not
the simplest unbiased impact estimate is the        received a mentor, assuming that all factors
difference in mean follow-up (or posttest) out-     that jointly affect participation and outcomes
comes for the treatment and control groups,         are included in the regression. Regressions
as in Equation 2,                                   are also useful in randomized experiments for
                                                    estimating impacts more precisely.
(2) b = Mean(Yfu,T) − Mean(Yfu,C)

where b is the estimated impact of the pro-         Suspicious Comparisons
gram, Yfu,T is the value of outcome Y at post-      The coefficient b from Equation 3 is an unbi-
test or follow-up for the treatment group           ased estimate of the program’s impact (i.e.,
youth, and Yfu,C is the value of outcome Y at       the estimate differs from the true impact
posttest or follow-up for the control group         only by a random error with mean of zero)
youth. One can increase the precision of the        as long as the two groups are identical on all
impact estimate by calculating the change-          characteristics (both included and excluded
score or difference-in-difference estimator as      variables). The key to obtaining an unbiased
in Equation 3,                                      estimate of the impact is to ensure that one
                                                    compares groups of youth that are as similar
(3) b = Mean(Yfu,T − Ybl,T) – Mean(Yfu,C − Ybl,C)   as possible on all the important observable
                                                    and unobservable characteristics that influ-
where Ybl,T is the value of outcome Y at base-      ence outcomes. Although many researchers
line for the treatment group youth, and Ybl,C       understand the need for comparability and
is the value of outcome Y at baseline for the       indeed think a lot about it when construct-
control group youth.                                ing a matched comparison group, this pro-
                                                    found insight is often forgotten in the analysis
Even more precision can be gained if the            phase, when the final comparisons are made.
researcher controls for other covariate factors     Most notably, if one omits youth from either
that affect the outcome through the use of          group—the randomly selected treatment (or
regression, as in Equation 4,                       self-selected participant) group or the ran-
                                                    domly selected control (or matched compari-
(4) Yfu = a + bT + cYbl + dX + u                    son) group—the resulting impact estimate is
                                                    potentially biased. Following is a list of com-
where b is the estimated impact of the pro-         monly seen yet flawed comparisons related to
gram, T is a dummy variable equal to 1 for          this concern.
treatments and 0 for controls, and X is a vec-
tor of baseline covariates that affect Y and u

13
Suspect Comparison 1: Comparing groups of            On the other hand, the estimate based on all
youth based on their match status, such as compar-   treatments and all controls, called the “intent-
ing those who received a mentor or youth whose       to-treat effect,” is unaffected by this bias.
matches lasted at least one month with the control
or comparison group. Suppose, as occurred            Because the intent-to-treat estimate is based
in the Public/Private Ventures evaluation            on the outcomes of all of the treatment youth,
of BBBS’s community-based mentoring                  whether or not they received the program,
program, only 75 percent of the treatment            it may underestimate the “impact on the
group actually received mentors (Grossman,           treated” (i.e., the effect of actually receiving
Tierney 1998). Can one compare the out-              the treatment). A common way to calculate
comes of the 75 percent who were mentees             the “impact on the treated” is to divide the
with the controls to get an unbiased estimate        intent-to-treat estimate by the proportion
of the program’s impact? No. All the impact          of youth actually receiving the program
estimates must be based on comparisons               (Bloom 1984). The intent-to-treat estimate is a
between the entire treatment group and the           weighted average of the impact on the treated
entire control group to maintain the com-            youth (ap) and the impact on the untreated
plete comparability of the two groups. (This         youth (anp), as shown in Equation 5,
estimate often is referred to as the impact of
the “intent to treat.”)                              (5) Mean(T) − Mean(C) = a = p ap + (1 −p) anp

There are undoubtedly factors that are sys-          where p = proportion treated.
tematically different between youth who form
mentoring relationships and those who do             If the effect of group assignment on the
not. The latter youth may be more difficult          untreated youth (anp) is zero (i.e., untreated
temperamentally, or their families may have          treatment individuals are neither hurt
decided they really did not want mentors             nor helped), then a is to equal a/p. Let’s
and withdrew from the program. If research-          again take the example of the BBBS evalu-
ers remove these unmatched youth from the            ation. Recall that 18 months after random
treatment group but do nothing with the              assignment, 7 percent of the treatment
control group, they could be comparing the           group youth (the treated and untreated)
“better” treatment youth with the “average”          had started using drugs, compared with
control group child, biasing the impact esti-        11 percent of the control group youth, a
mates. Randomization ensures that the treat-         4-percentage-point reduction. Using the
ment and control groups are equivalent (i.e.,        knowledge that only 75 percent of the youth
there are just as many “better” youth in the         actually received mentors, the “impact on
control group as the treatment group). After         the treated” of starting to use drugs would
the intervention, matched youth are read-            increase from a 4-percentage-point reduction
ily identified. Researchers, however, cannot         to a 5.3-percentage-point reduction (= 4/.75).
identify the control group youth who would
have been matched successfully had they been         Similar bias occurs if one removes con-
given the opportunity. Thus, if one discarded        trol group members from the comparison.
the unmatched treatment youth, implicitly            Reconsider the school-based mentoring
one is comparing successfully matched youth          example described above, where treatment
to a mixed group—those for whom a match              youth are offered mentors and control youth
would have been found (had they been offered         are denied mentors for one year. Suppose
participation) and those for whom matches            that although most youth participate for only
would not be found (who are perhaps harder           a year, some continue their matches into a
to serve). An impact estimate based on such a        second school year. To gauge the impact of
comparison has the potential to bias the esti-       this longer intervention, the evaluator might
mate in favor of the program’s effectiveness.        (incorrectly) consider comparing youth who
(The selection bias embedded in matching is          had mentors for two years with control youth
the reason researchers might choose to com-          who were not matched after their one-year
pare the outcomes of a matched comparison            denial period. This comparison has several
group with the outcomes of mentoring pro-            problems. Youth who were able to sustain
gram applicants, rather than participants.)          their relationships into a second year, for
                                                     example, would likely be better able to relate

14
to adults and perhaps more malleable to a             with negative outcomes disappear, while
mentoring intervention than the “average”             the indications of positive effects of longer
originally matched comparison group mem-              matches remained.
ber. An unbiased way to examine these pro-
gram impacts would be to compare groups               A similar problem occurs when comparing
that were assigned randomly at the beginning          youth with close relationships with those with
of the evaluation: one group being offered the        weaker relationships. For the straightforward
possibility of a two-year match and the other         comparison to be valid, one is implicitly
being denied the program for two years. To            assuming that youth who ended up with close
investigate both one- and two-year versions of        relationships with their mentors would have,
the program, applicants would need to be ran-         in the absence of the program, fared equally
domized into one of three groups: one group           well or poorly as youth who did not end up
offered the possibility of a two-year match,          with close relationships. If those with closer
one group offered the possibility of a one-year       relationships would have, without the pro-
match and one group denied the program for            gram, been better able to secure adult atten-
the full two years.                                   tion than the other youth and done better
                                                      because of it, for example, then a comparison
Lesson: The only absolutely unbiased estimate         of the close-relationship youth with either
from a random assignment evaluation of a              youth in less-close relationships or with the
mentoring program is based on the compari-            control/matched comparison group could
son of all treatments and all controls, not just      be flawed.
the matched treatments or those matched for
particular lengths of time.                           Lesson: Any examination of groups defined by
                                                      a program variable—such as having a mentor,
Suspect Comparison 2: Comparing effects based on      the length of the relationship, having a cross-
relationship characteristics, such as short matches   race match—is potentially plagued by selec-
with longer matches or closer relationships with      tion bias regardless of the evaluation design
less close relationships. Grossman and Rhodes         employed. Valid subgroup estimates can be
(2002) examined the effects of different              calculated only for subgroups defined on pre-
lengths of matches using the BBBS evalua-             program characteristics, such as gender or race
tion data. In the first part of the paper, the        or preprogram achievement levels or grades. In
researchers reported the straightforward              these cases, we can precisely identify and make
comparisons between outcomes of those                 comparisons to a comparable subgroup within
matched less than 6 months, 6 to 12 months            the control group (against which the treatment
and more than 12 months with the control              subgroup may be compared).
group’s outcomes. Although interesting,
these simple comparisons ignore the poten-            Suspect Comparison 3: Comparing the outcomes of
tial differences among youth who are able to          mentored youth with a control or matched compari-
sustain their mentoring relationships for dif-        son group when the sample attrition at the follow-up
ferent periods of time. If the different match        assessment is substantial or, worse yet, when there
lengths were induced randomly across pairs            is differential attrition between the two groups.
or the reasons for a breakup were unrelated           Once again, unless those who were assessed
to the outcomes being examined, then there            at posttest were just like the youth for whom
would be no problem with the simple set of            one does not have posttest data, the impact
comparisons. However, if, for example, youth          estimates may be biased. Suppose youth from
who cannot form relationships that last more          the most mobile, unstable households are the
than five months are less able to get the adult       ones who could not be located. Comparing
attention and resources they need and conse-          the “found” treatment and controls only pro-
quently would do worse than longer-matched            vides information about the impact of the
youth even without the intervention, then the         program on youth from stable homes, not all
first set of comparisons would produce biased         youth. This is an issue of generalizability (i.e.,
impact estimates. Indeed, when the research-          external validity; see Shadish et al. 2002).
ers statistically controlled for this potential
bias (using two-staged least squares regres-          Differential attrition between the treat-
sion, as discussed below), they saw evidence          ment and the control (or participant and
of the strong association of short matches            comparison) groups is important because

15
it also poses a threat to internal validity.        Statistical Corrections for Biases
Frequently, researchers are able to reassess        What if one wants to examine program
a much higher fraction of program partici-          impacts under these compromised situa-
pants—many of whom may still be meeting             tions—such as dealing with differential attri-
with their mentors—than of the control or           tion or examining the impact of mentoring
comparison group youth (whom no one has             on youth whose matches have lasted more
necessarily tracked on a regular basis). For        than a year? There are a variety of statistical
example, if the control or comparison group         methods to handle these biases. As long as the
youth demonstrate increased behavioral or           assumptions underlying these methods hold,
academic problems over the sample period,           then the resulting adjusted impact estimates
parents may move their children to attend           should be unbiased.
other schools and thus make data collection
more difficult. Alternatively, some treatment       Let’s start by restating the basic hypothesized
families may have decided not to move out of        model:
the area because the children had good men-
tors. Under any of these scenarios, comparing       (6) Yfu = a + bM + cYbl + dX + u
the reassessed comparison group youth with
reassessed mentees could be a comparison of         The value of outcome Yfu is determined by its
unlike individuals.                                 value at baseline (Ybl), whether the child got
                                                    mentoring (M), a vector of baseline covariates
Technically, any amount of attrition—even           that affect Y (X) and unmeasured factors (u).
if it is equal across the two groups—puts           Suppose one has information on a group of
the accuracy of the impact estimates into           mentees and a comparison group of youth
question. The treatment group youth who             matched on age, gender and school. Now
cannot be located may be fundamentally              suppose, however, the youth who actually get
different from control group youth who can-         mentors differ from the comparison youth
not be located. For example, the control            in that they are more likely to be firstborn. If
attriters might be the youth whose parents          firstborn youth do better on outcome Y (even
enroll them in new schools because they are         controlling for the baseline level of Y) and
not doing well, while the treatment attriters       one fails to control for this difference, the
might be the youth whose parents moved.             estimated impact coefficient (b) will be biased
However, as long as one can show that the           upward, picking up not only the effect of
baseline characteristics of the two groups are      mentoring on Y but also the “firstborn-ness”
similar, most readers will accept the hypoth-       of the mentees. The problem here is that M
esis that the two groups of follow-up respond-      and u are correlated.
ers are still similar. Similarly, if the baseline
characteristics of the attriters are the same       If one hypothesizes that the only way the
as those of the responders, then we can be          participating youth differ from the average
more confident that the attrition was simply        nonparticipating youth is on measurable
random and that the impact on the respond-          characteristics (Z)—for example, they are
ers is indicative of the impact on all youth.       more likely to be firstborn or to be Hispanic—
                                                    then including these characteristics in the
Lesson: Comparisons of treatment (or parti-         impact regression model, Equation 7, will
cipant) groups and control (or compari-             fully remove the correlation between M and
son) groups are completely valid only if the        u, because M conditional on (i.e., controlling
youth not included in the comparison are            for) Z is not correlated with u. Thus, Equation
simply a random sample of those included.           7 will produce an unbiased estimate of the
This assumption is easier to believe if the         impact (b):
nonincluded individuals represent a small
proportion of the total sample, the baseline        (7) Yfu = a + bM + cYbl + dX +fZ + u
characteristics of nonresponders are similar
to those of responders and the proportions          Including such extra covariates is a common
excluded are the same for the treatment and         technique. However if, as is usually the case,
control groups.                                     one suspects (or even could plausibly argue)
                                                    that the mentored group is different in other
                                                    ways that are correlated with outcomes and

16
are unmeasured, such as being more socially          “working” (i.e., having longer duration) but
competent or from better-parented families,          not related theoretically to the child’s grades
then the estimate coefficient still will be          or behaviors.
potentially biased.
                                                     Then one estimates the following regression
                                                     of M:
Instrumental Variables or Two-Staged
Least Squares
                                                     (10) M = k + mZ + nX + cY + w
Using instrumental variables (IV), also called                               bl
two-staged least squares regression (TSLS), is
                                                     where w is a random error. All of the covari-
a statistical way to obtain unbiased (or consis-
                                                     ates that will be included in the final impact
tent) impact estimates in this more compli-
                                                     Equation 7, X and Ybl are included in the
cated position (see Stock and Watson 2003,
                                                     first-stage regression along with the instru-
Chapter 10).
                                                     ments Z. A predicted value of M (M’ = k +
                                                     mZ + nX + cYbl) is then computed for each
Consider the factors influencing M (whether
                                                     sample member. The properties of regres-
the child is a mentee):
                                                     sion ensure that M’ will be uncorrelated with
                                                     the part of Yfu not accounted for by Ybl , or X
(8) M = k + mZ + nX + v
                                                     (i.e., u). M’ then is used in Equation 7 rather
                                                     than M. The second stage of TSLS estimates
where Z represents variables related to M
                                                     Equation 7 and the corrected standard errors
that are unrelated to Y, X represents variables
                                                     (see Stock and Watson 2003 for details). This
related to M that are related to Y and v is the
                                                     technique works only if one has good pre-
random error.
                                                     dictive instruments. As a rule of thumb, the
                                                     F-test for the Stage 1 regression should have
Substituting Equation 8 into Equation 6
                                                     a value of at least 10 if the instrument is to
results in:
                                                     be considered valid.
(9) Yfu = a + b(k + mZ + nX + v) + cYbl + u
                                                     Baseline Predictions
The problem is that v (the unmeasured ele-           Suspect Comparison 2 illustrates how any
ments related to participating in a mentoring        examination of groups defined by a program
program, such as having motivated parents) is        variable, such as having a long relationship
correlated with u. This correlation will cause       or a cross-race match, is potentially plagued
the regression to estimate a biased value for b.     by the type of selection bias we have been
However, using instrumental variables, we are        discussing. Schochet et al. (2001) employed
able to purge out v (the elements of M that          a remarkably clever nonstatistical technique
are correlated with u) to get an unbiased esti-      for estimating the unbiased impact of a
mate of the impact. Intuitively, this technique      program in such a case. The researchers
constructs a variable that is not M but is highly    knew they wanted to compare the impacts
correlated with M and is not correlated with u       of participants who would choose different
(an “instrument”).                                   versions of a program. However, because
                                                     one could not know who among the control
The first and most difficult step in using this      group would have chosen each program ver-
approach is to identify variables that 1) are        sion, it appeared that one could not make
related to why a child is in the group being         a valid comparison. To get around this
examined, such as being a mentee or a long-          problem, they asked the intake workers who
matched child, and 2) are not related to the         interviewed all applicants before random
outcome Y. These are very hard to think of,          assignment (both treatments and controls)
must be measured for both treatment and              to predict which version of the program each
control youth, and need to be considered             youth would end up in if all were offered the
before data collection starts. Examples might        program. The researchers then estimated
include the youth’s interests, such as sports or     the impact of Version A (and similarly B)
outdoor activities, or how difficult it is for the   by comparing the outcomes of treatment
mentor to drive to the child’s home. These           and control group members deemed to be
variables would be related to the match              “A-likely” by the intake workers. Note that

17
they were not comparing the treatment
youth who actually did Version A to the
A-likely control youth, but rather compar-
ing the A-likely treatments to the A-likely
controls. Because the intake workers were
quite accurate in their predictions, this
technique is convincing. For mentoring pro-
grams, staff could similarly predict which
youth would likely end up receiving mentors
or which would probably experience long-
term matches based on the information they
gathered during the intake process and their
knowledge of the program. This baseline
(preprogram) characteristic then could be
used to identify a valid comparison.




18
Future Directions




Synthesis                                           by making comparisons that undermine the
Good evaluations gauge a program’s impacts          balanced nature of treatment and control
on a range of more to less ambitious out-           groups. Numerous statistical techniques, such
comes that could realistically change over          as the use of instrumental variables, have
the period of observation given the likely          been developed to help researchers estimate
program dosage; they assess outcomes using          unbiased program impacts. However, their use
measures that are sensitive enough to detect        requires forethought at the data collection
the expected or policy-relevant change; and         stage to ensure that one has the data needed
they use multiple measures and perspectives         to make the required statistical adjustments.
to assess an impact.
                                                    Recommendations for Research
The crux of obtaining internally valid impact
                                                    Given the aforementioned issues, researchers
estimates is knowing what would have happened
                                                    evaluating mentoring programs should con-
to the members of the treatment group had
                                                    sider the following suggestions:
they not received mentors. Simple pre/post
designs assume the participant would not have
                                                    1. Design for disaster. Assume things will go
changed—that the postprogram behavior would
                                                       wrong. Random assignment will be under-
have been exactly what the preprogram behav-
                                                       mined. There will be differential attri-
ior was without the program. This is a particu-
                                                       tion. The comparison group will not be
larly poor assumption for youth. Experimental
                                                       perfectly matched. To guard against these
and quasi-experimental evaluations are more
                                                       problems, researchers should think deeply
valid because they use the behavior of the com-
                                                       about how the two groups might differ if
parison group to represent what would have
                                                       any of these problems were to arise, then
happened (the counterfactual state).
                                                       collect data at baseline that could be used
                                                       for matching or making statistical adjust-
The internal validity of an evaluation depends
                                                       ments. It is also useful to give forethought
critically on the comparability of the treat-
                                                       to which program subgroups will be exam-
ment (or participant) and control (or compar-
                                                       ined and to collect variables that could
ison) groups. If one can make a plausible case
                                                       help predict these program statuses, such
that the two groups differ on a factor that also
                                                       as the length of a match.
affects the outcomes, the estimated impact
may be biased by this factor. Because random
                                                    2. Gather implementation or process information.
assignment (with sufficiently large samples)
                                                       This information is necessary to understand
creates two groups that are statistically equiva-
                                                       one’s impact results—why the program had
lent in all observable and unobservable char-
                                                       no effect or what type of program had the
acteristics, evaluations with this design are, in
                                                       effects that were estimated. These data and
principle, superior to matched comparison
                                                       data on program quality also can enable
group designs; matched comparison groups
                                                       one to explore what about the program led
can, at best, assure comparability only on the
                                                       to the change.
important observable characteristics.
                                                    3. Use random assignment or match on motiva-
Evaluators using matched comparison groups
                                                       tional factors. Random assignment should
must always worry about potential selection-
                                                       be a researcher’s first choice, but if quasi-
bias problems; in practice, researchers con-
                                                       experimental methods must be used,
ducting random assignment evaluations
                                                       researchers should try to match participant
often run into selection-bias problems too
                                                       and comparison youth on some of the less




19
obvious factors. The more one can con-           2. Collaborate with local researchers to conduct
     vince readers that the groups are equiva-           impact studies periodically. When program
     lent on all the relevant variables, including       staff feel it is time to conduct a more
     some of the hard-to-measure factors, such           rigorous impact study, they should con-
     as motivation or comfort with adults, the           sider collaborating with local research-
     more credible the impact estimates will be.         ers. Given the time, skills and complexity
                                                         entailed in conducting impact research,
                                                         trained researchers can complete the task
Recommendations for Practice
                                                         much more efficiently. An outside evalu-
Given the complexities of computing valid                ation also may be believed more readily.
impact estimates, what should a program do               Researchers, furthermore, can become
to measure effectiveness?                                a resource for improving the program’s
                                                         ongoing monitoring system.
1. Monitor key process variables or benchmarks.
   Walker and Grossman (1999) argued
   that not every program should conduct
   a rigorous impact study: It is a poor use
   of resources, given the cost of research
   and the relative skills of staff. However,
   programs should use data to improve
   their programming (see United Way of
   America’s Measuring Program Outcomes 1996
   or the W. K. Kellogg Foundation Evaluation
   Handbook 2000). Grossman and Johnson
   (1999) recommended that mentoring pro-
   grams track three key dimensions: youth
   and volunteer characteristics, match length,
   and quality benchmarks. More specifically,
   programs could track basic information
   about youth and volunteers: what types
   and numbers apply, and what types and
   numbers are matched. They could also
   track information about how long matches
   last—for example, the proportion making
   it to various benchmarks. Last, they could
   measure and track benchmarks, such as the
   quality of the relationship (Rhodes et al.
   2005). This approach allows programs to
   measure factors that (a) can be tracked eas-
   ily and (b) can provide insight about their
   possible impacts without collecting data on
   the counterfactual state. Pre/post changes
   can be a benchmark (but not an impact
   estimate), and one must be careful that the
   types of youth served and the general envi-
   ronment are stable. If the pre/post changes
   for cohorts of youth improve over time, for
   example, but the program now is serving
   less needy youth, the change in this bench-
   mark tells little about the effectiveness of the
   program (the counterfactual states for the
   early and later cohorts differ).




20
References




Bloom, H. S.                                            Grossman, J. B. and J. E. Rhodes
1984    “Accounting for No-Shows in                     2002   “The Test of Time: Predictors and
        Experimental Evaluation Designs.”                      Effects of Duration in Youth Mentoring
        Evaluation Review, 8, 225–246.                         Programs.” American Journal of Community
                                                               Psychology, 30, 199–206.
Branch, A. Y.
2002    Faith and Action: Implementation of the         Grossman, J. B. and J. P. Tierney
        National Faith-Based Initiative for High-Risk   1998   “Does Mentoring Work? An Impact Study
        Youth. Philadelphia: Branch Associates and             of the Big Brothers Big Sisters Program.”
        Public/Private Ventures.                               Evaluation Review, 22, 403–426.

Dennis, M. L.                                           Orr, L. L.
1994    “Ethical and Practical Randomized Field         1999     Social Experiments: Evaluating Public
        Experiments.” In J. S. Wholey, H. P. Hatry               Programs with Experimental Methods.
        and K. E. Newcomer, eds., Handbook of                    Thousand Oaks, CA: Sage.
        Practical Program Evaluation. San Francisco:
        Jossey-Bass, 155–197.                           Rhodes, J., R. Reddy, J. Roffman and J. Grossman
                                                        2005    “Promoting Successful Youth Mentoring
DuBois, D. L., B. E. Holloway, J. C. Valentine and              Relationships: A Preliminary Screening
H. Cooper                                                       Questionnaire.” Journal of Primary
2002    “Effectiveness of Mentoring Programs for                Prevention, 147-167.
        Youth: A Meta-Analytic Review.” American
        Journal of Community Psychology, 30, 157–       Rosenbaum, P. R. and D. B. Rubin
        197.                                            1983   “The Central Role of the Propensity
                                                               Score in Observational Studies for Causal
DuBois, D. L., H. A. Neville, G. R. Parra and                  Effects.” Biometrika, 70, 41–55.
A. O. Pugh-Lilly
2002    “Testing a New Model of Mentoring.”             Rosenberg, M.
        In G. G. Noam, ed. in chief, and J. E.          1979    “Rosenberg Self-Esteem Scale.” In K.
        Rhodes, ed., A Critical View of Youth                   Corcoran and J. Fischer (2000). Measures
        Mentoring (New Directions for Youth                     for Clinical Practice: A Sourcebook (3rd ed.).
        Development: Theory, Research, and Practice,            New York: Free Press, 610–611.
        No. 93, 21–57). San Francisco: Jossey-Bass.
                                                        Rossi, P. H., H. E. Freeman and M. W. Lipsey
DuBois, D. L. and M. J. Karcher, eds.                   1999      Evaluation: A Systematic Approach (6th
2005    Handbook of Youth Mentoring. Thousand                     edition). Thousand Oaks, CA: Sage.
        Oaks, CA: Sage Publications, Inc.
                                                        Rubin, D. B.
Dynarski, M., C. Pistorino, M. Moore,                   1997    “Estimating Causal Effects from Large
T. Silva, J. Mullens, J. Deke et al.                            Data Sets Using Propensity Scores.” Annals
2003       When Schools Stay Open Late: The National            of Internal Medicine, 127, 757–763.
           Evaluation of the 21st Century Community
           Learning Centers Program. Washington, DC:    Schochet, P., J. Burghardt and S. Glazerman
           US Department of Education.                  2001    National Job Corps Study: The Impacts
                                                                of Job Corps on Participants’ Employment
Eccles, J. S., C. Midgley and T. F. Adler                       and Related Outcomes. Princeton, NJ:
1984      “Grade-Related Changes in School                      Mathematica Policy Research.
          Environment: Effects on Achievement
          Motivation.” In J. G. Nicholls, ed., The
          Development of Achievement Motivation.
          Greenwich, CT: JAI Press, 285–331.

Grossman, J. B. and A. Johnson
1999   “Judging the Effectiveness of Mentoring
       Programs.” In J. B. Grossman, ed.,
       Contemporary Issues in Mentoring.
       Philadelphia: Public/Private Ventures,
       24–47.


21
Shadish, W. R., T. D. Cook and D. T. Campbell
2002     Experimental and Quasi-Experimental Designs
         for Generalized Causal Inference. Boston:
         Houghton Mifflin.

Stock, J. H. and M. W. Watson
2003      Introduction to Econometrics. Boston:
          Addison-Wesley.

Tierney, J. P., J. B. Grossman and N. L. Resch
1995     Making a Difference: An Impact Study of Big
         Brothers/Big Sisters. Philadelphia: Public/
         Private Ventures.

United Way of America
1996    Measuring Program Outcomes. Arlington, VA:
        United Way of America.

Walker, G. and J. B. Grossman
1999     “Philanthropy and Outcomes: Dilemmas
         in the Quest for Accountability.” In C. T.
         Clotfelter and T. Ehrlich, eds., Philanthropy
         and the Nonprofit Sector in a Changing
         America. Bloomington: Indiana University
         Press, 449–460.

Weiss, C. H.
1998     Evaluation. Upper Saddle River, NJ:
         Prentice Hall.

W. K. Kellogg Foundation
2000     W.K. Kellogg Foundation Evaluation
         Handbook. Battle Creek, MI: W. K. Kellogg
         Foundation.




22
Public/Private Ventures
2000 Market Street, Suite 600
Philadelphia, PA 19103
Tel: (215) 557-4400
Fax: (215) 557-4469

New York Office
The Chanin Building
122 East 42nd Street, 42nd Floor
New York, NY 10168
Tel: (212) 822-2400
Fax: (212) 949-0439

California Office
Lake Merritt Plaza, Suite 1550
1999 Harrison Street
Oakland, CA 94612
Tel: (510) 273-4600
Fax: (510) 273-4619

www.ppv.org

Contenu connexe

En vedette

Mentoring #teachmeet #tmintl
Mentoring #teachmeet #tmintl Mentoring #teachmeet #tmintl
Mentoring #teachmeet #tmintl Tamas Lorincz
 
Faith based team mentoring training cd
Faith based team mentoring training cdFaith based team mentoring training cd
Faith based team mentoring training cdDenis Rigdon
 
2011 Michigan Mentoring Month Webinar
2011 Michigan Mentoring Month Webinar2011 Michigan Mentoring Month Webinar
2011 Michigan Mentoring Month WebinarMentor Michigan
 
Dcu Undergrad presentation feb 11th 2013
Dcu Undergrad presentation feb 11th 2013Dcu Undergrad presentation feb 11th 2013
Dcu Undergrad presentation feb 11th 2013Liam Walsh
 
Mentoring groups rings--nrnw version teamspace v
Mentoring groups rings--nrnw version teamspace vMentoring groups rings--nrnw version teamspace v
Mentoring groups rings--nrnw version teamspace vMaurice Young
 
296 cd mentoring
296 cd mentoring296 cd mentoring
296 cd mentoringattila19
 
Dr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.com
Dr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.comDr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.com
Dr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.comWilliam Kritsonis
 
Financial Literacy Among Teens
Financial Literacy Among TeensFinancial Literacy Among Teens
Financial Literacy Among Teensgarydrubin
 
Informal Library Youth Programs: Global STEMx Education Conference
Informal Library Youth Programs: Global STEMx Education ConferenceInformal Library Youth Programs: Global STEMx Education Conference
Informal Library Youth Programs: Global STEMx Education ConferenceJennifer Hopwood
 
Mentoring scm uof_s_2012
Mentoring scm uof_s_2012Mentoring scm uof_s_2012
Mentoring scm uof_s_2012Steven Myers
 

En vedette (19)

pepe751
pepe751pepe751
pepe751
 
Mentoring 387
Mentoring 387Mentoring 387
Mentoring 387
 
Mentoring #teachmeet #tmintl
Mentoring #teachmeet #tmintl Mentoring #teachmeet #tmintl
Mentoring #teachmeet #tmintl
 
Free Arts Content Strategy
Free Arts Content StrategyFree Arts Content Strategy
Free Arts Content Strategy
 
Faith based team mentoring training cd
Faith based team mentoring training cdFaith based team mentoring training cd
Faith based team mentoring training cd
 
PINs workshop: Taking your Mentoring to the Next Level
PINs workshop: Taking your Mentoring to the Next Level PINs workshop: Taking your Mentoring to the Next Level
PINs workshop: Taking your Mentoring to the Next Level
 
2011 Michigan Mentoring Month Webinar
2011 Michigan Mentoring Month Webinar2011 Michigan Mentoring Month Webinar
2011 Michigan Mentoring Month Webinar
 
Mentoring
MentoringMentoring
Mentoring
 
Handout #10 - QIAMay4
Handout #10 - QIAMay4Handout #10 - QIAMay4
Handout #10 - QIAMay4
 
Social Media + National Mentoring Month = Opportunity
Social Media + National Mentoring Month = OpportunitySocial Media + National Mentoring Month = Opportunity
Social Media + National Mentoring Month = Opportunity
 
Dcu Undergrad presentation feb 11th 2013
Dcu Undergrad presentation feb 11th 2013Dcu Undergrad presentation feb 11th 2013
Dcu Undergrad presentation feb 11th 2013
 
Mentoring groups rings--nrnw version teamspace v
Mentoring groups rings--nrnw version teamspace vMentoring groups rings--nrnw version teamspace v
Mentoring groups rings--nrnw version teamspace v
 
Mentoring for Success
Mentoring for SuccessMentoring for Success
Mentoring for Success
 
296 cd mentoring
296 cd mentoring296 cd mentoring
296 cd mentoring
 
Dr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.com
Dr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.comDr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.com
Dr. Kritsonis, NATIONAL FORUM JOURNALS, www.nationalforum.com
 
Financial Literacy Among Teens
Financial Literacy Among TeensFinancial Literacy Among Teens
Financial Literacy Among Teens
 
Mentoring - Mentee Assignment Notice
Mentoring - Mentee Assignment NoticeMentoring - Mentee Assignment Notice
Mentoring - Mentee Assignment Notice
 
Informal Library Youth Programs: Global STEMx Education Conference
Informal Library Youth Programs: Global STEMx Education ConferenceInformal Library Youth Programs: Global STEMx Education Conference
Informal Library Youth Programs: Global STEMx Education Conference
 
Mentoring scm uof_s_2012
Mentoring scm uof_s_2012Mentoring scm uof_s_2012
Mentoring scm uof_s_2012
 

Similaire à Handout #9 - QIAMay4

child_adol_dev_teacher_ed.pdf
child_adol_dev_teacher_ed.pdfchild_adol_dev_teacher_ed.pdf
child_adol_dev_teacher_ed.pdfEdna869183
 
Collaboration and financial sustainability in christian higher education
Collaboration and financial sustainability in christian higher educationCollaboration and financial sustainability in christian higher education
Collaboration and financial sustainability in christian higher educationvisionSynergy
 
Thomas J. Kampwirth Kristin M. PowersCollaborative Consul.docx
Thomas J. Kampwirth  Kristin M. PowersCollaborative Consul.docxThomas J. Kampwirth  Kristin M. PowersCollaborative Consul.docx
Thomas J. Kampwirth Kristin M. PowersCollaborative Consul.docxjuliennehar
 
Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...
Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...
Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...Collaborative Mentoring Webinar Series 2012
 
My Poster PDF
My Poster PDFMy Poster PDF
My Poster PDFJie Yan
 
Fye 2009 Positive Diversity Program
Fye 2009 Positive Diversity ProgramFye 2009 Positive Diversity Program
Fye 2009 Positive Diversity Programscottboone
 
Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...
Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...
Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...Mentoring Partnership of Minnesota
 
Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...
Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...
Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...Collaborative Mentoring Webinar Series 2012
 
Alliant Leadership Conference Schedule
Alliant Leadership Conference ScheduleAlliant Leadership Conference Schedule
Alliant Leadership Conference ScheduleChristine Shine
 
Define problem identi
Define problem    identiDefine problem    identi
Define problem identiANIL247048
 
Edu 580 culminating project
Edu 580 culminating projectEdu 580 culminating project
Edu 580 culminating projectAnn1621
 

Similaire à Handout #9 - QIAMay4 (20)

JSARD-Winter-2016
JSARD-Winter-2016JSARD-Winter-2016
JSARD-Winter-2016
 
child_adol_dev_teacher_ed.pdf
child_adol_dev_teacher_ed.pdfchild_adol_dev_teacher_ed.pdf
child_adol_dev_teacher_ed.pdf
 
Back to School:  Training Mentors for Effective Relationships within Schools
Back to School:  Training Mentors for Effective Relationships within SchoolsBack to School:  Training Mentors for Effective Relationships within Schools
Back to School:  Training Mentors for Effective Relationships within Schools
 
Collaboration and financial sustainability in christian higher education
Collaboration and financial sustainability in christian higher educationCollaboration and financial sustainability in christian higher education
Collaboration and financial sustainability in christian higher education
 
Appleby college research
Appleby college researchAppleby college research
Appleby college research
 
IFPRI-NAIP- Leadership Program - J K Jena
IFPRI-NAIP- Leadership Program - J K JenaIFPRI-NAIP- Leadership Program - J K Jena
IFPRI-NAIP- Leadership Program - J K Jena
 
Appleby College Research
Appleby College ResearchAppleby College Research
Appleby College Research
 
Thomas J. Kampwirth Kristin M. PowersCollaborative Consul.docx
Thomas J. Kampwirth  Kristin M. PowersCollaborative Consul.docxThomas J. Kampwirth  Kristin M. PowersCollaborative Consul.docx
Thomas J. Kampwirth Kristin M. PowersCollaborative Consul.docx
 
Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...
Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...
Mentoring Disconnected Youth: How Mentors Can Help Reconnect Youth to School ...
 
My Poster PDF
My Poster PDFMy Poster PDF
My Poster PDF
 
Pushing the Boundaries of Mentoring: SIYM 2012 Preview
Pushing the Boundaries of Mentoring: SIYM 2012 PreviewPushing the Boundaries of Mentoring: SIYM 2012 Preview
Pushing the Boundaries of Mentoring: SIYM 2012 Preview
 
Fye 2009 Positive Diversity Program
Fye 2009 Positive Diversity ProgramFye 2009 Positive Diversity Program
Fye 2009 Positive Diversity Program
 
Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...
Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...
Cropping the Big Picture: What the New Meta-Analysis Means for Your Mentoring...
 
Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...
Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...
Cropping the Big Picture: Determining What the New Meta-Analysis Means for yo...
 
Shane kinney resume
Shane kinney   resumeShane kinney   resume
Shane kinney resume
 
Alliant Leadership Conference Schedule
Alliant Leadership Conference ScheduleAlliant Leadership Conference Schedule
Alliant Leadership Conference Schedule
 
From Portland To You
From Portland To YouFrom Portland To You
From Portland To You
 
DEFINE PROBLEM IDENTI
DEFINE PROBLEM    IDENTIDEFINE PROBLEM    IDENTI
DEFINE PROBLEM IDENTI
 
Define problem identi
Define problem    identiDefine problem    identi
Define problem identi
 
Edu 580 culminating project
Edu 580 culminating projectEdu 580 culminating project
Edu 580 culminating project
 

Plus de Mentoring Partnership of Minnesota

Growing the Evidence Based for Mentoring: Research & Impact of PPV
Growing the Evidence Based for Mentoring: Research & Impact of PPVGrowing the Evidence Based for Mentoring: Research & Impact of PPV
Growing the Evidence Based for Mentoring: Research & Impact of PPVMentoring Partnership of Minnesota
 
Elements of Effective Practice - Design, Management & Evaluation
Elements of Effective Practice - Design, Management & EvaluationElements of Effective Practice - Design, Management & Evaluation
Elements of Effective Practice - Design, Management & EvaluationMentoring Partnership of Minnesota
 
August QIA Resource: Cultural Competence Checklist Policies Procedures
August QIA Resource: Cultural Competence Checklist Policies ProceduresAugust QIA Resource: Cultural Competence Checklist Policies Procedures
August QIA Resource: Cultural Competence Checklist Policies ProceduresMentoring Partnership of Minnesota
 

Plus de Mentoring Partnership of Minnesota (20)

Maximize Your Impact - Thomson Reuters
Maximize Your Impact - Thomson ReutersMaximize Your Impact - Thomson Reuters
Maximize Your Impact - Thomson Reuters
 
Mentoring for Youth Involved in Juvenile Justice
Mentoring for Youth Involved in Juvenile Justice Mentoring for Youth Involved in Juvenile Justice
Mentoring for Youth Involved in Juvenile Justice
 
Improving Mentoring Services for Youth in Hennepin County
Improving Mentoring Services for Youth in Hennepin CountyImproving Mentoring Services for Youth in Hennepin County
Improving Mentoring Services for Youth in Hennepin County
 
Growing the Evidence Based for Mentoring: Research & Impact of PPV
Growing the Evidence Based for Mentoring: Research & Impact of PPVGrowing the Evidence Based for Mentoring: Research & Impact of PPV
Growing the Evidence Based for Mentoring: Research & Impact of PPV
 
Training Quality Mentors - handouts
Training Quality Mentors - handoutsTraining Quality Mentors - handouts
Training Quality Mentors - handouts
 
Training Quality Mentors - Slides
Training Quality Mentors - SlidesTraining Quality Mentors - Slides
Training Quality Mentors - Slides
 
EEP Trainer's Manual: Operations
EEP Trainer's Manual: OperationsEEP Trainer's Manual: Operations
EEP Trainer's Manual: Operations
 
EEP Trainer's Manual Handouts: Evaluation Section
EEP Trainer's Manual Handouts: Evaluation SectionEEP Trainer's Manual Handouts: Evaluation Section
EEP Trainer's Manual Handouts: Evaluation Section
 
EEP Trainer's Manual - Management Section
EEP Trainer's Manual - Management SectionEEP Trainer's Manual - Management Section
EEP Trainer's Manual - Management Section
 
EEP Trainer's Manual - Design & Planning Section
EEP Trainer's Manual - Design & Planning SectionEEP Trainer's Manual - Design & Planning Section
EEP Trainer's Manual - Design & Planning Section
 
Elements of Effective Practice - Program Operations
Elements of Effective Practice - Program OperationsElements of Effective Practice - Program Operations
Elements of Effective Practice - Program Operations
 
Elements of Effective Practice - Design, Management & Evaluation
Elements of Effective Practice - Design, Management & EvaluationElements of Effective Practice - Design, Management & Evaluation
Elements of Effective Practice - Design, Management & Evaluation
 
Evidence-Based Practice & Mentoring
Evidence-Based Practice & MentoringEvidence-Based Practice & Mentoring
Evidence-Based Practice & Mentoring
 
Navigating the Criminal Background Check System
Navigating the Criminal Background Check SystemNavigating the Criminal Background Check System
Navigating the Criminal Background Check System
 
Why Youth Mentoring Relationships End
Why Youth Mentoring Relationships EndWhy Youth Mentoring Relationships End
Why Youth Mentoring Relationships End
 
August QIA Resource: Mentoring Journal-Multiple Identities
August QIA Resource: Mentoring Journal-Multiple IdentitiesAugust QIA Resource: Mentoring Journal-Multiple Identities
August QIA Resource: Mentoring Journal-Multiple Identities
 
August QIA Resource: EnCountering Stereotypes
August QIA Resource: EnCountering StereotypesAugust QIA Resource: EnCountering Stereotypes
August QIA Resource: EnCountering Stereotypes
 
August QIA Resource: Cultural iceberg
August QIA Resource: Cultural icebergAugust QIA Resource: Cultural iceberg
August QIA Resource: Cultural iceberg
 
August QIA Resource: Cultural Competence Checklist Policies Procedures
August QIA Resource: Cultural Competence Checklist Policies ProceduresAugust QIA Resource: Cultural Competence Checklist Policies Procedures
August QIA Resource: Cultural Competence Checklist Policies Procedures
 
August QIA Resource: Checklist
August QIA Resource: Checklist August QIA Resource: Checklist
August QIA Resource: Checklist
 

Dernier

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 

Dernier (20)

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 

Handout #9 - QIAMay4

  • 1. Public/Private Ventures Brief Evaluating Mentoring Programs Jean Baldwin Grossman September 2009
  • 2. Public/Private Ventures Board of Directors Research Advisory is a national leader in creating and strength- Committee ening programs that Matthew T. McGuire, Chair Jacquelynne S. Eccles, Chair improve lives in low-income communities. We Principal University of Michigan do this in three ways: Origami Capital Partners, LLC Robert Granger Yvonne Chan William T. Grant Foundation innovation Principal Robinson Hollister We work with leaders in the field to identify Vaughn Learning Center Swarthmore College promising existing programs or develop new The Honorable Renée Reed Larson ones. Cardwell Hughes University of Illinois Judge, Court of Common Pleas Jean E. Rhodes research The First Judicial District, University of Massachusetts, We rigorously evaluate these programs to Philadelphia, PA Boston determine what is effective and what is not. Christine L. James-Brown Thomas Weisner President and CEO UCLA action Child Welfare We reproduce model programs in new League of America locations, provide technical assistance Robert J. LaLonde Professor where needed and inform policymakers and The University of Chicago practitioners about what works. John A. Mayer, Jr. P/PV is a 501(c)(3) nonprofit, nonpartisan Retired, Chief Financial Officer J. P. Morgan & Co. organization with offices in Philadelphia, New Anne Hodges Morgan York City and Oakland. For more information, Consultant to Foundations please visit www.ppv.org. Siobhan Nicolau President Hispanic Policy Development Project Marion Pines Senior Fellow Institute for Policy Studies Johns Hopkins University Clayton S. Rose Senior Lecturer Harvard Business School Cay Stratton Special Adviser UK Commission for Employment and Skills Sudhir Venkatesh William B. Ransford Professor of Sociology Columbia University William Julius Wilson Lewis P. and Linda L. Geyser University Professor Harvard University 2
  • 3. Acknowledgments This brief is a revised version of a chapter written for the Handbook of Youth Mentoring edited by David DuBois and Michael Karcher (2005). David and Michael provided many useful comments on the earlier version. Laura Johnson and Chelsea Farley of Public/Private Ventures helped revise the chapter to make it more accessible to non-mentoring specialists and provided great editorial advice. Additional reference: DuBois, D. L. and M. J. Karcher, eds. 2005. Handbook of Youth Mentoring. Thousand Oaks, CA: Sage Publications, Inc. 3
  • 4. Introduction Questions about mentoring abound. This article presents discussions of many issues Mentoring programs around the country that arise in answering both implementation are being asked by their funders and boards, or process questions and impact questions. “Does this mentoring program work?” Process questions are important to address Policymakers ask, “Does this particular type even if a researcher is interested only in of mentoring—be it school-based or group or impacts, because one should not ask, “Does it email—work?” These are questions about pro- work?” unless “it” actually occurred. The first gram impacts. Researchers and operators also section covers how one chooses appropriate want to know about the program’s processes: process and impact measures. The next sec- What about mentoring makes it work? How tion discusses several impact design issues, long should a match last to be effective? How including the inadequacies of simple pre/ frequently should matches meet? Does the post designs, the importance of a good com- level of training, support or supervision of parison group and several ways to construct the match matter? Does parental involvement comparison groups. The last section discusses or communication matter? What types of common mistakes made when analyzing interactions between youth and mentors lead evaluation data and presents ways to avoid to positive changes in the child? Then there them. For a more complete discussion of are questions about the populations served evaluation in general, readers are referred to and what practices are most effective: Are par- Rossi et al. (1999); Shadish et al. (2002); and ticular types of youth more affected by men- Weiss (1998). Due to space constraints, issues toring than others? Are mentors with specific entailed in answering mediational questions characteristics, such as being older or more are not addressed here. educated, more effective than other mentors or more effective with particular subgroups of youth? Finally, researchers in particular are interested in the theoretical underpinning of mentoring. For example, to what degree does mentoring work by changing children’s beliefs about themselves (such as boosting self-esteem or self-efficacy), by shaping their values (such as their views about education and the future) or by improving their social and/or cognitive skills? 4
  • 5. Measurement Issues A useful guide in deciding what to measure is mentoring programs have more detailed a program’s logic model or theory of change: ideas, such as wanting participants to experi- the set of hypothesized links between the pro- ence specific program elements (academic gram’s action, participants’ response and the support, for example, or peer interaction). If desired outcomes. As Weiss states, with such these are critical components of the program a theory in hand, “The evaluation can trace theory, they also make good candidates for the unfolding of the assumptions” (1998, 58). process measures. Rhodes et al. (2005) presents one possible theory of change for mentoring: Process mea- A second level of process question concerns sures describe the program’s actions; outcome the quality of the components: How good measures describe what effects the program has. are the relationships? Are the training and supervision useful? These are more difficult dimensions to measure. Client satisfaction Process Measures measures, such as how much youth like their The first question when examining a pro- mentors or how useful the mentors feel the gram is: What exactly is the program as training is, are one gauge of quality. However, experienced by participants? The effect the clients’ assessment of quality may not be program will have on participants depends accurate; as many teachers say, the most enjoy- on the realities of the program, not on its able class may not be the class that promotes official description. All too frequently in the most learning. Testing mentors before mentoring programs, relatively few strong and after training is an alternative quality relationships form and matched pairs stop measure. Assessing the quality of mentoring meeting. Process questions can be answered, relationships is a relatively unexplored area. however, at several levels. Most basically, Grossman and Johnson (1999) and Rhodes et one wants to know: Did the program recruit al. (2005) propose some measures. appropriate youth and adults? Did adults and youth meet as planned? Did all the compo- From a program operator’s or funder’s per- nents of the program happen? Were mentors spective, how much process information trained and supervised as expected? is “enough” depends on striking a balance between knowing exactly what is happening To address these questions, one examines in the program versus recognizing the service the characteristics and experiences of the the staff could have provided in lieu of collect- participants, mentors and the match, and ing data. Researchers should assess enough compares them with the program’s expecta- implementation data to be sure the program tions. For example, a mentoring program is actually delivering the services it purports to targeting youth involved in criminal or vio- offer at a level and quality consistent with hav- lent activity tracked the number of arrests of ing a detectable impact before spending the new participants to determine whether they time and money to collect data on outcomes. were serving their desired target populations Even if no impact is expected, it is essential to (Branch 2002). A high school mentoring know exactly what did or did not happen to program for struggling students tracked the the participants to understand one’s findings. GPAs of enrolled youth (Grossman, Johnson Thus, researchers may want to collect more 1999). Two match characteristics commonly process data than typically would be collected examined are the average completed length by operators to improve both the quality of of the relationship and the average frequency their generalizations and their ability to link of interaction. Like all good process mea- impacts to variation in participants’ experi- sures, they relate to the program’s theory. To ences of core elements of the program. be affected, a participant must experience a sufficient dosage of the intervention. Some 5
  • 6. Lesson: Tracking process measures is impor- the false impression that the program is a fail- tant to program managers but essential for ure, when in fact the impacts on the chosen evaluators. Before embarking on an evalua- variables may not yet have emerged. tion of impacts, be sure the program is deliv- ering its services at a quality and intensity that A good technique for selecting variables is would lead one to expect impacts. to choose a range of proximal to more distal expected impacts based on the program’s the- ory of change, which also represents a set of Outcome Measures impacts ranging from modestly to impressively An early task for an impact evaluator is to effective (Weiss 1998). Unfortunately, one can- refine the “Does it work?” question into a not know a priori how long matches will last set of testable evaluation questions. These or how often the individuals will meet. Thus, questions need to specify a set of outcome it is wise to include some outcomes that are variables that will be examined during the likely to change even with rather limited expo- evaluation. There are two criteria for a good sure to the intervention, and some outcomes outcome measure (Rossi et al. 1999). First, that would change with greater exposure, the outcome can be realistically expected to thus setting multiple “bars.” The most basic change during the study period given the effectiveness goal is an outcome that everyone intensity of the intervention. Second, the out- agrees should be achievable. From there, one come is measurable and the chosen measure can identify more ambitious outcomes. sensitive enough to detect the likely change. Public/Private Ventures’ evaluation of Big Evaluation questions are not program goals. Brothers Big Sisters (BBBS) provides a good Many programs rightly have lofty inspira- example of this process (Grossman and tional goals, such as enabling all participants Tierney 1998). Researchers conducted a thor- to excel academically or to become self- ough review of BBBS’s manual of standards sufficient, responsible citizens. However, a and practices to understand the program’s good evaluation outcome must be concrete, logic model and then, by working closely with measurable and likely to change enough dur- staff from the national office and local agen- ing the study period to be detected. Thus, for cies, generated multiple outcome bars. The example, achieving a goal like “helping youth national manual lists four “common” goals academically excel” could be gauged by exam- for a Little Brother or Little Sister: providing ining students’ grades or test scores. social, cultural and recreational enrichment; improving peer relationships; improving self- In addition, when choosing the specific set of concept; and improving motivation, attitude outcomes that will indicate a goal such as “aca- and achievement related to schoolwork. demically excelling,” one must consider which Conversations with BBBS staff also suggested of the possible variables are likely to change that having a Big Brother or Big Sister could given the program dosage participants will reduce the incidence of antisocial behav- probably receive during the evaluation period. iors such as drug and alcohol use and could For example, researchers often have found improve a Little Brother’s or Little Sister’s that reading and math achievement test scores relationship with his or her parent(s). Using change less quickly than do reading or math previous research, the hypothesized impacts grades, which, in turn, change less quickly were ordered from proximal to distal as fol- than school effort. Thus, if one is evaluating lows: increased opportunities for social and the school-year (i.e., nine months) impact of cultural enrichment, improved self-concept, a school-based mentoring program, one is better relationships with family and friends, likely to want to examine effort and grades improved academic outcomes and reduced rather than test scores, or at least in addition antisocial behavior. to test scores. Considerable care and thought need to go into deciding what outcomes data At a minimum, the mentoring experience was should be collected and when. Examining expected to enrich the cultural and social life impacts on outcomes that are unlikely to of youth, even though many more impacts change during the evaluation period can give were anticipated. Because motivational psy- chology research shows that attitudes often change before behaviors, the next set of 6
  • 7. outcomes reflected attitudinal changes toward Information from each source has advantages themselves and others. The “harder” academic and disadvantages. For example, for some and antisocial outcomes then were specified. variables, such as attitudes or beliefs, the youth Within these outcomes, researchers also may be the only individual who can provide hypothesized a range of impacts, from attitu- valid information. Youth, for example, arguably dinal variables, such as the child’s perceived are uniquely qualified to report on constructs sense of academic efficacy and value placed such as their self-esteem (outcome measures) on education, to some intermediate behav- or considerations such as how much they like ioral changes, such as school attendance and their mentors or whether they think their men- being sent to the principal’s office, to changes tors support and care for them (process mea- in grades, drug and alcohol use, and fighting. sures). Theoretically, what may be important is not what support the mentor actually gives but Once outcomes are identified, the next ques- how supportive the youth perceives the mentor tion is how to measure them. Two of the most to be (DuBois et al. 2002). important criteria for choosing a measure are whether the measure captures the exact facet On the other hand, youth-reported data may of the outcome that the program is expected be biased. First, youth may be more likely to to affect and whether it is sensitive enough to give socially desirable answers—recounting pick up small changes. For example, an aca- higher grades or less antisocial behavior. demically focused mentoring program that If this bias is different for mentored versus claims to increase the self-esteem of youth may nonmentored youth, impact estimates based help youth feel more academically competent on these variables could be biased. Second, but not improve their general feelings of self- the feelings of youth toward their mentors worth. Thus, one would want to use a scale may taint their reporting. For example, if the targeting academic self-worth or competence youth does not like the mentor’s style, he or rather than a global self-worth scale—or select she may selectively report or overreport cer- a scale that can measure both. The second tain negative experiences, such as the mentor consideration is the measure’s degree of sen- missing meetings, and underreport others of sitivity. Some measures are extremely good at a more positive nature, such as the amount of sorting a population or identifying a subgroup time the mentor spends providing help with in need of help but poor in detecting the small schoolwork. Similarly, the youth may overstate changes that typically result from programs. a mentor’s performance to make the mentor For example, in this author’s experience, the look good. Last, the younger the child is, the Rosenberg self-esteem scale (1979) is useful in less reliable or subtle the self-report. For this distinguishing adolescents with high and low reason, when participants are quite young (8 self-esteem but often is not sensitive enough to or 9 years old), it is advisable to collect infor- detect the small changes in self-esteem induced mation from their parents and/or teachers. by most youth programs. On the other hand, measures of academic or social competency The mentor often can be a good source of beliefs (Eccles et al. 1984) can detect relatively information about what the mentoring expe- small changes. rience is like, such as what the mentor and mentee do and talk about (process measures), Lesson: Choose outcomes that are integrally and as a reporter on the child’s behaviors at linked to the program’s theory of change, that posttest (outcome measures). The main prob- establish multiple “effectiveness bars,” that are lem with mentor reporting is that mentors gauged with sensitive measures and that can be have an incentive to report positively on their achieved within the evaluation’s time frame and relationships with youth and to see effects in the context of the program’s implementation. even if there are none, justifying why they are spending time with the child. Although there may be a positive bias, this does not preclude Choosing Informants mentors’ being accurate in reporting rela- Another issue to be resolved for either process tive impacts. This is because most mentors do or outcome measures is from whom to collect not report that their mentees have improved information. For mentoring programs, the equally in all areas. The pattern of differ- candidates are usually the youth, the mentor, ence in these reports, especially if consistent a parent, teachers and school records. 7
  • 8. with those obtained from other sources, such as school records, may provide useful infor- mation about the true impacts. Parents also can be useful as reporters. They may notice that the child is trying harder in school, for example, even though the child might not notice the change. However, like the mentor, parents may project changes that they wish were happening or be unaware of certain behaviors (e.g., substance use). Finally, teachers may be good reporters on the behaviors of their students during the school day. Teachers who are familiar with age-appropriate behavior, for example, may spot a problem when a parent or mentor does not. However, teachers are extraordinarily busy, and it can be difficult for them to find the time to fill out evaluation forms on the participants. In addition, teachers too are not immune to seeing what they want to see, and as with mentors and parents, the caveat about relative impacts applies here. Information also can be collected from records. Data about the occurrence of spe- cific events—fights, cut classes, principal visits—are less susceptible to bias, unless the sources of these data (e.g., school adminis- trators making discipline decisions) differ- entially judge or report events for mentored youth versus other youth. Lesson: Each respondent has a unique point of view, but all are susceptible to reporting what they wish had happened. Thus, if time and money allow, it is advantageous to exam- ine multiple perspectives on an outcome and triangulate on the impacts. What is important is to see a consistent pattern of impacts (not uniform consistency among the respondents). The more consistency there is, the more certain one can be that a particular impact occurred. For example, if the youth, parent and teacher data all indicate school improve- ment and test scores also increase, this would be particularly strong evidence of academic gains. Conversely, if only one of these mea- sures exhibits change (e.g., parent reports), it could be just a spurious finding. 8
  • 9. Design Issues Answering the questions “Does mentoring in the absence of the program, one must work?” and “For whom?” may seem relatively identify another group of youth, namely a straightforward—achievable simply by observ- comparison group, whose behavior will rep- ing the changes in mentees’ outcomes. But resent what the participants’ behavior would these ostensibly simple questions are harder have been without the program. Choosing to answer than one might assume. a group whose behavior accurately depicts this hypothetical no-treatment (or coun- terfactual) state is the crux of getting the The Fallacy of Pre/Post Comparisons right answer to the effectiveness question, The changes we observe in the attitudes, because a program’s impacts are ascertained behaviors or skills of youth while they are by comparing the behavior of the treatment being mentored are not equivalent to program or participant group with that of the selected impacts. How can that be? The answer has to comparison group. do with what statisticians call internal valid- ity. Consider, again, the previously described BBBS evaluation. If one looks only at changes Matched Comparison or Control Group in outcomes for treatment youth (Grossman, Construction Johnson 1999), one finds that 18 months after There are two principal types of comparison they applied to the program, 7 percent had groups: control groups generated through reported starting to use drugs. On the face of random assignment and matched comparison it, it appears that the program was ineffective; groups selected judgmentally by the researcher. however, during the same period, 11 percent of the controls had reported starting to use drugs. Experimental Control Groups Thus, rather than being ineffective, this statisti- cally significant difference indicates that BBBS Random assignment is the best way to create was able to stem some of the naturally occur- two groups that would change comparably ring increases in drug use. over time. In this type of evaluation, eligible individuals are assigned randomly, either to The critical distinction here is the difference the control group and not allowed into the between outcomes and impacts. In evalua- program, or to the treatment group, whose tion, an outcome is the value of any variable members are offered the program. (Note: “Treatments” and “controls” refer to randomly measured after the intervention, such as selected groups of individuals. Not all treat- grades. An impact is the difference between ments may choose to participate. The term the outcome observed and what it would “participants” is used to refer to individuals have been in the absence of the program who actually receive the program.) (Rossi et al. 1999); in other words, it is the change in the outcome that was caused by The principal advantage of random assign- the program. Simple changes in outcomes ment is that given large enough groups, may in part reflect the program’s impact on average, the two groups are statistically but also might reflect other factors, such as equivalent with respect to all characteristics, changes due to maturation. observed and unobserved, at the time the two groups are formed. If nothing were done Lesson: A program’s impact can be gauged to either group, their behaviors, on average, accurately (i.e., be internally valid) only if would continue to be statistically equivalent one knows what would have happened to at any point in the future. Thus, if after the the participants had they not been in the intervention the average behavior of the two program. This hypothetical state is called groups differs, the difference can be confi- the “counterfactual.” Because one cannot dently and causally linked to the program, observe what the mentees would have done 9
  • 10. which was the only systematic difference Matched (or Quasi-Experimental) between the two groups. See Orr (1999) for a Comparison Groups discussion of how large each group should be. Random assignment is not always possible. For example, programs may be too small or staff Although random assignment affords the may refuse to participate in such an evalu- most scientifically reliable way of creating two ation. When this is the case, researchers must comparable groups, there are many issues that identify a group of nonparticipant youth whose should be considered before using it. Two of outcomes credibly represent what would have the most difficult are “Can random assign- happened to the participants in the absence of ment be inserted into the program’s normal the program. The weakness of the methodol- process without qualitatively changing the pro- ogy is that the outcomes of the two groups can gram?” and “Is it ethical to deny certain youth differ not only because one group got a men- a mentor?” However, it is worth noting that tor and the other did not but also because of all programs ration their services, primarily other differences between the groups. To gen- by not advertising to more people than they erate internally valid estimates of the program’s can serve. Random assignment gives all needy impacts, one must control for the “other dif- children an equal probability of being served, ferences” either through statistical procedures rather than denying children who need a such as regression analysis and/or through mentor by not telling them about the pro- careful matching. gram. The reader is referred to Dennis (1994) for a detailed discussion of the ethical issues The researcher selects a comparison group involved in random assignment. of youth who are as similar as possible to the participant group across all the important With respect to the first issue, consider first characteristics that may influence outcomes how the insertion of random assignment into in the counterfactual state (the hypotheti- the intake process affects the program. One cal no-treatment state). Some key charac- of the misconceptions about random assign- teristics are relatively easy to identify and ment among mentoring staff is that it means match for (e.g., age, race, gender or family randomly pairing youth with adults. This is not structure). However, to improve the cred- the case. Random pairing would fundamentally ibility of a matched comparison group, one change the program, and any evaluation of needs to think deeply about other potential this altered program would not provide infor- differences that could affect the outcome dif- mation on the effect of the actual program. A ferential, such as whether one group of youth valid use of random assignment would entail comes from families that care enough and randomly dividing eligible applicants between are competent enough to search out services the treatment and control groups, then pro- for their youth, or how comfortable the youth cessing the treatment group youth just as they are with adults. These critical yet hard-to- normally would be handled and matched. measure variables are factors that are likely to Under this design, random assignment affects systematically differ between participant and only which youth files come across the staff’s comparison group youth and to substantially desk for matching, not what happens to youth affect one or more of the outcomes being once they are there. Another valid test would examined. The more readers of an evaluation involve identifying two youth for every volun- can think of such variables that have not been teer, then randomly assigning one child to the accounted for, the less they will believe the treatment group and one to the control group. resulting program impact estimates. For the BBBS evaluation, we used the former method because it was significantly less bur- Consider, for example, an email mentor- densome and emotionally more acceptable for ing program. Not only would one want the the staff. However, the chosen design meant comparison group to match the participant that not all treatment youth actually received group on demographic characteristics—age a mentor. As will be discussed later, only (say, 12, 13, 14 or 15 years old), gender (male, about three quarters of the youth who were female), race (white, Hispanic, black) and randomized into the treatment group and income (poor, nonpoor)—but one might offered the program actually received men- also want to match the two groups on their tors. (See Orr 1999 for a rich discussion of preprogram use of the computer, such as all aspects of random assignment.) the average number of hours per week spent 10
  • 11. using email or playing computer games. To probabilities are calculated for both par- match on this variable, however, one would ticipants and all potential nonparticipants. have to collect computer use data on many Each participant then is matched with one or nonparticipant youth to find those most com- more nonparticipant youth based on these parable to the participants. predicted propensity scores. For example, for each participant, the nonparticipant with the When one has more than a few matching vari- closest predicted participation probability can ables, the number of cells becomes too numer- be selected into the comparison group. (See ous. In the above example, we would have Shadish et al. 2002, 161–165, for further dis- 4 age × 2 gender × 3 race × 2 income, or 48 cussion of PSM, and Dynarski et al. 2003 for cells, even before splitting by computer use. A an application in a school-based setting.) method that is used with increasing frequency to address this issue is propensity score match- An implication of this technique is that one ing (PSM). A propensity score is the probability needs data for the propensity score logit from of being a participant given a set of known fac- both the participant group and a large pool of tors. In simple random assignment evaluations, nonparticipant youth who will be considered the propensity score of every sample member is for inclusion in the comparison group. The 50 percent, regardless of his or her characteris- larger the considered nonparticipant pool is, tics. In the real world, without random assign- the more likely it is that one can find a close ment, the probability of being a participant propensity score match for each participant. depends on the individual’s characteristics, This data requirement often pushes research- such as his or her comfort with computers in ers to select matching factors that are readily the example above. Thus, participants and available through records rather than incur nonparticipants naturally differ with regard to the expense of collecting new data. many characteristics. PSM can help researchers select which nonparticipants best match the One weakness of this method is that although participant group with respect to a weighted the propensity to participate will be quite simi- average of all these characteristics (where the lar for the participant and comparison groups, weights reflect how important the factors are in the percentage with a particular characteris- making the individual a participant). tic (such as male) may not be, because PSM matches on a linear combination of character- To calculate these weights, the researcher istics, not each characteristic one by one. To estimates, across both the participant and overcome this weakness, most studies match nonparticipant samples, a logistic model of propensity scores within a few demographi- the probability of being a participant (Pi) cally defined cells (such as race/gender). as a function of the matching variables and all other factors that are hypothesized to be PSM also balances the two groups only on the related to participation (Rosenbaum, Rubin factors that went into the propensity score 1983; Rubin 1997). For example, if one were regression. For example, the PSM in Dynarski evaluating a school-based mentoring program, et al. (2003) was based on data gathered from the equation might include age, gender, race, 21,000 students to generate a comparison household status (HH) and reduced-price- group for their approximately 2,500 partici- lunch status (RL), as well as past academic pants. However, when data were collected later (GPA) and behavior (BEH) assessments, as is on parents, it turned out that comparison shown in Equation 1 below. Obtaining teacher group students were from higher-income fami- ratings of the youth’s interpersonal skills lies. No matter how carefully a comparison (SOC) also would help match on the youth’s group is constructed, one can never know for ability to form a relationship. sure how similar this group is to the participant group on unmeasured characteristics, such as (1) Pi = f(age, gender, race, HH, RL, GPA, BEH, SOC) their ability to respond to adult guidance. The next step of PSM is to compute for each potential member of the sample the prob- ability of participation based on the matching characteristics in the regression. Predicted 11
  • 12. Lesson: How much a reader trusts the internal validity of an evaluation depends on how much he or she trusts that the comparison group truly is similar to the participant group on all important dimensions. This level of trust or confidence is quantifiable in random assign- ment designs (e.g., one is 95 percent confident that the two groups are statistically equivalent), whereas with a quasi-experimental design, this level of trust is uncertain and unquantifiable. 12
  • 13. Analysis This section covers how impact estimates are (unmeasured factors). Another way to think derived, from the simplest techniques to more of b is that it is basically the difference in the statistically sophisticated ones. Several com- mean Ys, adjusting for differences in Xs. monly committed errors and techniques used to overcome these problems are presented. When data are from a quasi-experimental evaluation, it is always best to estimate impacts using regression or analysis of covariance; The Basics of Impact Estimates not only does one get more precise estimates, Impact estimates for both experimental and but one can control for any differences that quasi-experimental evaluation are basically do arise between the participant and the determined by contrasting the outcomes of comparison groups. Regression simulates the participant or treatment group with those what outcomes youth who were exactly like of the control or comparison group. If one participants on all the included characteris- has data from a random assignment design, tics (the Xs) would have had if they had not the simplest unbiased impact estimate is the received a mentor, assuming that all factors difference in mean follow-up (or posttest) out- that jointly affect participation and outcomes comes for the treatment and control groups, are included in the regression. Regressions as in Equation 2, are also useful in randomized experiments for estimating impacts more precisely. (2) b = Mean(Yfu,T) − Mean(Yfu,C) where b is the estimated impact of the pro- Suspicious Comparisons gram, Yfu,T is the value of outcome Y at post- The coefficient b from Equation 3 is an unbi- test or follow-up for the treatment group ased estimate of the program’s impact (i.e., youth, and Yfu,C is the value of outcome Y at the estimate differs from the true impact posttest or follow-up for the control group only by a random error with mean of zero) youth. One can increase the precision of the as long as the two groups are identical on all impact estimate by calculating the change- characteristics (both included and excluded score or difference-in-difference estimator as variables). The key to obtaining an unbiased in Equation 3, estimate of the impact is to ensure that one compares groups of youth that are as similar (3) b = Mean(Yfu,T − Ybl,T) – Mean(Yfu,C − Ybl,C) as possible on all the important observable and unobservable characteristics that influ- where Ybl,T is the value of outcome Y at base- ence outcomes. Although many researchers line for the treatment group youth, and Ybl,C understand the need for comparability and is the value of outcome Y at baseline for the indeed think a lot about it when construct- control group youth. ing a matched comparison group, this pro- found insight is often forgotten in the analysis Even more precision can be gained if the phase, when the final comparisons are made. researcher controls for other covariate factors Most notably, if one omits youth from either that affect the outcome through the use of group—the randomly selected treatment (or regression, as in Equation 4, self-selected participant) group or the ran- domly selected control (or matched compari- (4) Yfu = a + bT + cYbl + dX + u son) group—the resulting impact estimate is potentially biased. Following is a list of com- where b is the estimated impact of the pro- monly seen yet flawed comparisons related to gram, T is a dummy variable equal to 1 for this concern. treatments and 0 for controls, and X is a vec- tor of baseline covariates that affect Y and u 13
  • 14. Suspect Comparison 1: Comparing groups of On the other hand, the estimate based on all youth based on their match status, such as compar- treatments and all controls, called the “intent- ing those who received a mentor or youth whose to-treat effect,” is unaffected by this bias. matches lasted at least one month with the control or comparison group. Suppose, as occurred Because the intent-to-treat estimate is based in the Public/Private Ventures evaluation on the outcomes of all of the treatment youth, of BBBS’s community-based mentoring whether or not they received the program, program, only 75 percent of the treatment it may underestimate the “impact on the group actually received mentors (Grossman, treated” (i.e., the effect of actually receiving Tierney 1998). Can one compare the out- the treatment). A common way to calculate comes of the 75 percent who were mentees the “impact on the treated” is to divide the with the controls to get an unbiased estimate intent-to-treat estimate by the proportion of the program’s impact? No. All the impact of youth actually receiving the program estimates must be based on comparisons (Bloom 1984). The intent-to-treat estimate is a between the entire treatment group and the weighted average of the impact on the treated entire control group to maintain the com- youth (ap) and the impact on the untreated plete comparability of the two groups. (This youth (anp), as shown in Equation 5, estimate often is referred to as the impact of the “intent to treat.”) (5) Mean(T) − Mean(C) = a = p ap + (1 −p) anp There are undoubtedly factors that are sys- where p = proportion treated. tematically different between youth who form mentoring relationships and those who do If the effect of group assignment on the not. The latter youth may be more difficult untreated youth (anp) is zero (i.e., untreated temperamentally, or their families may have treatment individuals are neither hurt decided they really did not want mentors nor helped), then a is to equal a/p. Let’s and withdrew from the program. If research- again take the example of the BBBS evalu- ers remove these unmatched youth from the ation. Recall that 18 months after random treatment group but do nothing with the assignment, 7 percent of the treatment control group, they could be comparing the group youth (the treated and untreated) “better” treatment youth with the “average” had started using drugs, compared with control group child, biasing the impact esti- 11 percent of the control group youth, a mates. Randomization ensures that the treat- 4-percentage-point reduction. Using the ment and control groups are equivalent (i.e., knowledge that only 75 percent of the youth there are just as many “better” youth in the actually received mentors, the “impact on control group as the treatment group). After the treated” of starting to use drugs would the intervention, matched youth are read- increase from a 4-percentage-point reduction ily identified. Researchers, however, cannot to a 5.3-percentage-point reduction (= 4/.75). identify the control group youth who would have been matched successfully had they been Similar bias occurs if one removes con- given the opportunity. Thus, if one discarded trol group members from the comparison. the unmatched treatment youth, implicitly Reconsider the school-based mentoring one is comparing successfully matched youth example described above, where treatment to a mixed group—those for whom a match youth are offered mentors and control youth would have been found (had they been offered are denied mentors for one year. Suppose participation) and those for whom matches that although most youth participate for only would not be found (who are perhaps harder a year, some continue their matches into a to serve). An impact estimate based on such a second school year. To gauge the impact of comparison has the potential to bias the esti- this longer intervention, the evaluator might mate in favor of the program’s effectiveness. (incorrectly) consider comparing youth who (The selection bias embedded in matching is had mentors for two years with control youth the reason researchers might choose to com- who were not matched after their one-year pare the outcomes of a matched comparison denial period. This comparison has several group with the outcomes of mentoring pro- problems. Youth who were able to sustain gram applicants, rather than participants.) their relationships into a second year, for example, would likely be better able to relate 14
  • 15. to adults and perhaps more malleable to a with negative outcomes disappear, while mentoring intervention than the “average” the indications of positive effects of longer originally matched comparison group mem- matches remained. ber. An unbiased way to examine these pro- gram impacts would be to compare groups A similar problem occurs when comparing that were assigned randomly at the beginning youth with close relationships with those with of the evaluation: one group being offered the weaker relationships. For the straightforward possibility of a two-year match and the other comparison to be valid, one is implicitly being denied the program for two years. To assuming that youth who ended up with close investigate both one- and two-year versions of relationships with their mentors would have, the program, applicants would need to be ran- in the absence of the program, fared equally domized into one of three groups: one group well or poorly as youth who did not end up offered the possibility of a two-year match, with close relationships. If those with closer one group offered the possibility of a one-year relationships would have, without the pro- match and one group denied the program for gram, been better able to secure adult atten- the full two years. tion than the other youth and done better because of it, for example, then a comparison Lesson: The only absolutely unbiased estimate of the close-relationship youth with either from a random assignment evaluation of a youth in less-close relationships or with the mentoring program is based on the compari- control/matched comparison group could son of all treatments and all controls, not just be flawed. the matched treatments or those matched for particular lengths of time. Lesson: Any examination of groups defined by a program variable—such as having a mentor, Suspect Comparison 2: Comparing effects based on the length of the relationship, having a cross- relationship characteristics, such as short matches race match—is potentially plagued by selec- with longer matches or closer relationships with tion bias regardless of the evaluation design less close relationships. Grossman and Rhodes employed. Valid subgroup estimates can be (2002) examined the effects of different calculated only for subgroups defined on pre- lengths of matches using the BBBS evalua- program characteristics, such as gender or race tion data. In the first part of the paper, the or preprogram achievement levels or grades. In researchers reported the straightforward these cases, we can precisely identify and make comparisons between outcomes of those comparisons to a comparable subgroup within matched less than 6 months, 6 to 12 months the control group (against which the treatment and more than 12 months with the control subgroup may be compared). group’s outcomes. Although interesting, these simple comparisons ignore the poten- Suspect Comparison 3: Comparing the outcomes of tial differences among youth who are able to mentored youth with a control or matched compari- sustain their mentoring relationships for dif- son group when the sample attrition at the follow-up ferent periods of time. If the different match assessment is substantial or, worse yet, when there lengths were induced randomly across pairs is differential attrition between the two groups. or the reasons for a breakup were unrelated Once again, unless those who were assessed to the outcomes being examined, then there at posttest were just like the youth for whom would be no problem with the simple set of one does not have posttest data, the impact comparisons. However, if, for example, youth estimates may be biased. Suppose youth from who cannot form relationships that last more the most mobile, unstable households are the than five months are less able to get the adult ones who could not be located. Comparing attention and resources they need and conse- the “found” treatment and controls only pro- quently would do worse than longer-matched vides information about the impact of the youth even without the intervention, then the program on youth from stable homes, not all first set of comparisons would produce biased youth. This is an issue of generalizability (i.e., impact estimates. Indeed, when the research- external validity; see Shadish et al. 2002). ers statistically controlled for this potential bias (using two-staged least squares regres- Differential attrition between the treat- sion, as discussed below), they saw evidence ment and the control (or participant and of the strong association of short matches comparison) groups is important because 15
  • 16. it also poses a threat to internal validity. Statistical Corrections for Biases Frequently, researchers are able to reassess What if one wants to examine program a much higher fraction of program partici- impacts under these compromised situa- pants—many of whom may still be meeting tions—such as dealing with differential attri- with their mentors—than of the control or tion or examining the impact of mentoring comparison group youth (whom no one has on youth whose matches have lasted more necessarily tracked on a regular basis). For than a year? There are a variety of statistical example, if the control or comparison group methods to handle these biases. As long as the youth demonstrate increased behavioral or assumptions underlying these methods hold, academic problems over the sample period, then the resulting adjusted impact estimates parents may move their children to attend should be unbiased. other schools and thus make data collection more difficult. Alternatively, some treatment Let’s start by restating the basic hypothesized families may have decided not to move out of model: the area because the children had good men- tors. Under any of these scenarios, comparing (6) Yfu = a + bM + cYbl + dX + u the reassessed comparison group youth with reassessed mentees could be a comparison of The value of outcome Yfu is determined by its unlike individuals. value at baseline (Ybl), whether the child got mentoring (M), a vector of baseline covariates Technically, any amount of attrition—even that affect Y (X) and unmeasured factors (u). if it is equal across the two groups—puts Suppose one has information on a group of the accuracy of the impact estimates into mentees and a comparison group of youth question. The treatment group youth who matched on age, gender and school. Now cannot be located may be fundamentally suppose, however, the youth who actually get different from control group youth who can- mentors differ from the comparison youth not be located. For example, the control in that they are more likely to be firstborn. If attriters might be the youth whose parents firstborn youth do better on outcome Y (even enroll them in new schools because they are controlling for the baseline level of Y) and not doing well, while the treatment attriters one fails to control for this difference, the might be the youth whose parents moved. estimated impact coefficient (b) will be biased However, as long as one can show that the upward, picking up not only the effect of baseline characteristics of the two groups are mentoring on Y but also the “firstborn-ness” similar, most readers will accept the hypoth- of the mentees. The problem here is that M esis that the two groups of follow-up respond- and u are correlated. ers are still similar. Similarly, if the baseline characteristics of the attriters are the same If one hypothesizes that the only way the as those of the responders, then we can be participating youth differ from the average more confident that the attrition was simply nonparticipating youth is on measurable random and that the impact on the respond- characteristics (Z)—for example, they are ers is indicative of the impact on all youth. more likely to be firstborn or to be Hispanic— then including these characteristics in the Lesson: Comparisons of treatment (or parti- impact regression model, Equation 7, will cipant) groups and control (or compari- fully remove the correlation between M and son) groups are completely valid only if the u, because M conditional on (i.e., controlling youth not included in the comparison are for) Z is not correlated with u. Thus, Equation simply a random sample of those included. 7 will produce an unbiased estimate of the This assumption is easier to believe if the impact (b): nonincluded individuals represent a small proportion of the total sample, the baseline (7) Yfu = a + bM + cYbl + dX +fZ + u characteristics of nonresponders are similar to those of responders and the proportions Including such extra covariates is a common excluded are the same for the treatment and technique. However if, as is usually the case, control groups. one suspects (or even could plausibly argue) that the mentored group is different in other ways that are correlated with outcomes and 16
  • 17. are unmeasured, such as being more socially “working” (i.e., having longer duration) but competent or from better-parented families, not related theoretically to the child’s grades then the estimate coefficient still will be or behaviors. potentially biased. Then one estimates the following regression of M: Instrumental Variables or Two-Staged Least Squares (10) M = k + mZ + nX + cY + w Using instrumental variables (IV), also called bl two-staged least squares regression (TSLS), is where w is a random error. All of the covari- a statistical way to obtain unbiased (or consis- ates that will be included in the final impact tent) impact estimates in this more compli- Equation 7, X and Ybl are included in the cated position (see Stock and Watson 2003, first-stage regression along with the instru- Chapter 10). ments Z. A predicted value of M (M’ = k + mZ + nX + cYbl) is then computed for each Consider the factors influencing M (whether sample member. The properties of regres- the child is a mentee): sion ensure that M’ will be uncorrelated with the part of Yfu not accounted for by Ybl , or X (8) M = k + mZ + nX + v (i.e., u). M’ then is used in Equation 7 rather than M. The second stage of TSLS estimates where Z represents variables related to M Equation 7 and the corrected standard errors that are unrelated to Y, X represents variables (see Stock and Watson 2003 for details). This related to M that are related to Y and v is the technique works only if one has good pre- random error. dictive instruments. As a rule of thumb, the F-test for the Stage 1 regression should have Substituting Equation 8 into Equation 6 a value of at least 10 if the instrument is to results in: be considered valid. (9) Yfu = a + b(k + mZ + nX + v) + cYbl + u Baseline Predictions The problem is that v (the unmeasured ele- Suspect Comparison 2 illustrates how any ments related to participating in a mentoring examination of groups defined by a program program, such as having motivated parents) is variable, such as having a long relationship correlated with u. This correlation will cause or a cross-race match, is potentially plagued the regression to estimate a biased value for b. by the type of selection bias we have been However, using instrumental variables, we are discussing. Schochet et al. (2001) employed able to purge out v (the elements of M that a remarkably clever nonstatistical technique are correlated with u) to get an unbiased esti- for estimating the unbiased impact of a mate of the impact. Intuitively, this technique program in such a case. The researchers constructs a variable that is not M but is highly knew they wanted to compare the impacts correlated with M and is not correlated with u of participants who would choose different (an “instrument”). versions of a program. However, because one could not know who among the control The first and most difficult step in using this group would have chosen each program ver- approach is to identify variables that 1) are sion, it appeared that one could not make related to why a child is in the group being a valid comparison. To get around this examined, such as being a mentee or a long- problem, they asked the intake workers who matched child, and 2) are not related to the interviewed all applicants before random outcome Y. These are very hard to think of, assignment (both treatments and controls) must be measured for both treatment and to predict which version of the program each control youth, and need to be considered youth would end up in if all were offered the before data collection starts. Examples might program. The researchers then estimated include the youth’s interests, such as sports or the impact of Version A (and similarly B) outdoor activities, or how difficult it is for the by comparing the outcomes of treatment mentor to drive to the child’s home. These and control group members deemed to be variables would be related to the match “A-likely” by the intake workers. Note that 17
  • 18. they were not comparing the treatment youth who actually did Version A to the A-likely control youth, but rather compar- ing the A-likely treatments to the A-likely controls. Because the intake workers were quite accurate in their predictions, this technique is convincing. For mentoring pro- grams, staff could similarly predict which youth would likely end up receiving mentors or which would probably experience long- term matches based on the information they gathered during the intake process and their knowledge of the program. This baseline (preprogram) characteristic then could be used to identify a valid comparison. 18
  • 19. Future Directions Synthesis by making comparisons that undermine the Good evaluations gauge a program’s impacts balanced nature of treatment and control on a range of more to less ambitious out- groups. Numerous statistical techniques, such comes that could realistically change over as the use of instrumental variables, have the period of observation given the likely been developed to help researchers estimate program dosage; they assess outcomes using unbiased program impacts. However, their use measures that are sensitive enough to detect requires forethought at the data collection the expected or policy-relevant change; and stage to ensure that one has the data needed they use multiple measures and perspectives to make the required statistical adjustments. to assess an impact. Recommendations for Research The crux of obtaining internally valid impact Given the aforementioned issues, researchers estimates is knowing what would have happened evaluating mentoring programs should con- to the members of the treatment group had sider the following suggestions: they not received mentors. Simple pre/post designs assume the participant would not have 1. Design for disaster. Assume things will go changed—that the postprogram behavior would wrong. Random assignment will be under- have been exactly what the preprogram behav- mined. There will be differential attri- ior was without the program. This is a particu- tion. The comparison group will not be larly poor assumption for youth. Experimental perfectly matched. To guard against these and quasi-experimental evaluations are more problems, researchers should think deeply valid because they use the behavior of the com- about how the two groups might differ if parison group to represent what would have any of these problems were to arise, then happened (the counterfactual state). collect data at baseline that could be used for matching or making statistical adjust- The internal validity of an evaluation depends ments. It is also useful to give forethought critically on the comparability of the treat- to which program subgroups will be exam- ment (or participant) and control (or compar- ined and to collect variables that could ison) groups. If one can make a plausible case help predict these program statuses, such that the two groups differ on a factor that also as the length of a match. affects the outcomes, the estimated impact may be biased by this factor. Because random 2. Gather implementation or process information. assignment (with sufficiently large samples) This information is necessary to understand creates two groups that are statistically equiva- one’s impact results—why the program had lent in all observable and unobservable char- no effect or what type of program had the acteristics, evaluations with this design are, in effects that were estimated. These data and principle, superior to matched comparison data on program quality also can enable group designs; matched comparison groups one to explore what about the program led can, at best, assure comparability only on the to the change. important observable characteristics. 3. Use random assignment or match on motiva- Evaluators using matched comparison groups tional factors. Random assignment should must always worry about potential selection- be a researcher’s first choice, but if quasi- bias problems; in practice, researchers con- experimental methods must be used, ducting random assignment evaluations researchers should try to match participant often run into selection-bias problems too and comparison youth on some of the less 19
  • 20. obvious factors. The more one can con- 2. Collaborate with local researchers to conduct vince readers that the groups are equiva- impact studies periodically. When program lent on all the relevant variables, including staff feel it is time to conduct a more some of the hard-to-measure factors, such rigorous impact study, they should con- as motivation or comfort with adults, the sider collaborating with local research- more credible the impact estimates will be. ers. Given the time, skills and complexity entailed in conducting impact research, trained researchers can complete the task Recommendations for Practice much more efficiently. An outside evalu- Given the complexities of computing valid ation also may be believed more readily. impact estimates, what should a program do Researchers, furthermore, can become to measure effectiveness? a resource for improving the program’s ongoing monitoring system. 1. Monitor key process variables or benchmarks. Walker and Grossman (1999) argued that not every program should conduct a rigorous impact study: It is a poor use of resources, given the cost of research and the relative skills of staff. However, programs should use data to improve their programming (see United Way of America’s Measuring Program Outcomes 1996 or the W. K. Kellogg Foundation Evaluation Handbook 2000). Grossman and Johnson (1999) recommended that mentoring pro- grams track three key dimensions: youth and volunteer characteristics, match length, and quality benchmarks. More specifically, programs could track basic information about youth and volunteers: what types and numbers apply, and what types and numbers are matched. They could also track information about how long matches last—for example, the proportion making it to various benchmarks. Last, they could measure and track benchmarks, such as the quality of the relationship (Rhodes et al. 2005). This approach allows programs to measure factors that (a) can be tracked eas- ily and (b) can provide insight about their possible impacts without collecting data on the counterfactual state. Pre/post changes can be a benchmark (but not an impact estimate), and one must be careful that the types of youth served and the general envi- ronment are stable. If the pre/post changes for cohorts of youth improve over time, for example, but the program now is serving less needy youth, the change in this bench- mark tells little about the effectiveness of the program (the counterfactual states for the early and later cohorts differ). 20
  • 21. References Bloom, H. S. Grossman, J. B. and J. E. Rhodes 1984 “Accounting for No-Shows in 2002 “The Test of Time: Predictors and Experimental Evaluation Designs.” Effects of Duration in Youth Mentoring Evaluation Review, 8, 225–246. Programs.” American Journal of Community Psychology, 30, 199–206. Branch, A. Y. 2002 Faith and Action: Implementation of the Grossman, J. B. and J. P. Tierney National Faith-Based Initiative for High-Risk 1998 “Does Mentoring Work? An Impact Study Youth. Philadelphia: Branch Associates and of the Big Brothers Big Sisters Program.” Public/Private Ventures. Evaluation Review, 22, 403–426. Dennis, M. L. Orr, L. L. 1994 “Ethical and Practical Randomized Field 1999 Social Experiments: Evaluating Public Experiments.” In J. S. Wholey, H. P. Hatry Programs with Experimental Methods. and K. E. Newcomer, eds., Handbook of Thousand Oaks, CA: Sage. Practical Program Evaluation. San Francisco: Jossey-Bass, 155–197. Rhodes, J., R. Reddy, J. Roffman and J. Grossman 2005 “Promoting Successful Youth Mentoring DuBois, D. L., B. E. Holloway, J. C. Valentine and Relationships: A Preliminary Screening H. Cooper Questionnaire.” Journal of Primary 2002 “Effectiveness of Mentoring Programs for Prevention, 147-167. Youth: A Meta-Analytic Review.” American Journal of Community Psychology, 30, 157– Rosenbaum, P. R. and D. B. Rubin 197. 1983 “The Central Role of the Propensity Score in Observational Studies for Causal DuBois, D. L., H. A. Neville, G. R. Parra and Effects.” Biometrika, 70, 41–55. A. O. Pugh-Lilly 2002 “Testing a New Model of Mentoring.” Rosenberg, M. In G. G. Noam, ed. in chief, and J. E. 1979 “Rosenberg Self-Esteem Scale.” In K. Rhodes, ed., A Critical View of Youth Corcoran and J. Fischer (2000). Measures Mentoring (New Directions for Youth for Clinical Practice: A Sourcebook (3rd ed.). Development: Theory, Research, and Practice, New York: Free Press, 610–611. No. 93, 21–57). San Francisco: Jossey-Bass. Rossi, P. H., H. E. Freeman and M. W. Lipsey DuBois, D. L. and M. J. Karcher, eds. 1999 Evaluation: A Systematic Approach (6th 2005 Handbook of Youth Mentoring. Thousand edition). Thousand Oaks, CA: Sage. Oaks, CA: Sage Publications, Inc. Rubin, D. B. Dynarski, M., C. Pistorino, M. Moore, 1997 “Estimating Causal Effects from Large T. Silva, J. Mullens, J. Deke et al. Data Sets Using Propensity Scores.” Annals 2003 When Schools Stay Open Late: The National of Internal Medicine, 127, 757–763. Evaluation of the 21st Century Community Learning Centers Program. Washington, DC: Schochet, P., J. Burghardt and S. Glazerman US Department of Education. 2001 National Job Corps Study: The Impacts of Job Corps on Participants’ Employment Eccles, J. S., C. Midgley and T. F. Adler and Related Outcomes. Princeton, NJ: 1984 “Grade-Related Changes in School Mathematica Policy Research. Environment: Effects on Achievement Motivation.” In J. G. Nicholls, ed., The Development of Achievement Motivation. Greenwich, CT: JAI Press, 285–331. Grossman, J. B. and A. Johnson 1999 “Judging the Effectiveness of Mentoring Programs.” In J. B. Grossman, ed., Contemporary Issues in Mentoring. Philadelphia: Public/Private Ventures, 24–47. 21
  • 22. Shadish, W. R., T. D. Cook and D. T. Campbell 2002 Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston: Houghton Mifflin. Stock, J. H. and M. W. Watson 2003 Introduction to Econometrics. Boston: Addison-Wesley. Tierney, J. P., J. B. Grossman and N. L. Resch 1995 Making a Difference: An Impact Study of Big Brothers/Big Sisters. Philadelphia: Public/ Private Ventures. United Way of America 1996 Measuring Program Outcomes. Arlington, VA: United Way of America. Walker, G. and J. B. Grossman 1999 “Philanthropy and Outcomes: Dilemmas in the Quest for Accountability.” In C. T. Clotfelter and T. Ehrlich, eds., Philanthropy and the Nonprofit Sector in a Changing America. Bloomington: Indiana University Press, 449–460. Weiss, C. H. 1998 Evaluation. Upper Saddle River, NJ: Prentice Hall. W. K. Kellogg Foundation 2000 W.K. Kellogg Foundation Evaluation Handbook. Battle Creek, MI: W. K. Kellogg Foundation. 22
  • 23. Public/Private Ventures 2000 Market Street, Suite 600 Philadelphia, PA 19103 Tel: (215) 557-4400 Fax: (215) 557-4469 New York Office The Chanin Building 122 East 42nd Street, 42nd Floor New York, NY 10168 Tel: (212) 822-2400 Fax: (212) 949-0439 California Office Lake Merritt Plaza, Suite 1550 1999 Harrison Street Oakland, CA 94612 Tel: (510) 273-4600 Fax: (510) 273-4619 www.ppv.org