IFPRI- Impact Surveys 1

Sampling and statistical
power
Devesh Roy (on behalf of IFPRI-IFAD team)

Questionnaire of impact surveys
• Module A – Identification –cluster, household id, cluster
• Module B- Household demographics- sex, age, literacy
• Module C-Survey Questions- Dwellings, Drinking water supply,
Sanitation, Food security, Asset related questions, farming and
livestock questions, Anthropometry

Important details underlying this
questionnaire
• No tracking of households- straightforward identification module
• Limited set of household characteristics
• Sample size exogenously fixed and simple random sampling proposed
but effects across strata envisioned

Some elements of robust IE–leading to RIMS+
•Choosing the Sample
• Issue -what data needed and the sample required to precisely estimate differences in
outcomes between the treatment group and the comparison group.
• Determine both the sample size and how to draw the units in the sample from a
population of interest.
• What Kinds of Data Do I Need?
• Need SMART data- specific, measurable, attributable, realistic, and targeted.
• Good quality data are required to assess the impact of the intervention on the
outcomes of interest.
• The IE should not measure only outcomes for which the program is directly
accountable. Outcome indicators indirectly affected or indicators capturing unintended
program impact will maximize the value of the information that the IE generates.

Primer on sample and nature of data
• some indicators may not be amenable to IE in small samples. Detecting
impacts for outcomes that are
• extremely variable,
• rare events,
• or that are likely to be only marginally affected by an intervention may require
prohibitively large samples.
• Example Identifying the impact of an intervention on maternal mortality
rates will be feasible only in a sample that contains many pregnant women.
• Data on exogenous factors that may affect the outcome of interest. These
make it possible to control for outside influences.
• Data on other characteristics. Including additional controls or analyzing the
heterogeneity of the program’s effects along certain characteristics

Role of existing data
• To set benchmark value of indicators
• Power calculation for getting the minimum sample size
• Here in IFAD projects- One might not need to do power calculations
but get some sense of when you would need a larger sample vis-à-vis
smaller sample

How large the sample must be?
• Associated calculations are called power calculations.
• Avoid collecting too much data as well as too few data
• Remember too few data - If the sample is too small, you may not be
able to detect positive impact and may thus conclude that the
program had no effect
• Assume for simplicity that all who are intended to be beneficiary take
part and those intended to be non-beneficiary remain so.

Power calculations
• Most impact evaluations test a simple hypothesis
• Does the program have an impact? In other words, Is the program
impact different from zero? Answering this question requires two
steps:
• 1. Estimate the average outcomes for the beneficiary and non-
beneficiary groups.
• 2. Assess whether a difference exists between the average outcome
for the treatment beneficiary group and the average outcome for the
non-beneficiary group.

Large versus small sample (Large sample reduces
the chance of being unlucky (Gertler et al 2010)

Consider some related example
• Take a nutrition program
• Take anthropometric measure of the beneficiary and non-beneficiary
• Take a sample of 2 children (beneficiary and non- beneficiary) and do it many
times
• The estimates from the different samples taken repeatedly will bounce a lot-
implies the estimates are unreliable
• Take a children sample of 100 and repeat it many times –
• What do you see- which estimate bounces much more ?
• It is the smaller sample

Errors in Impact evaluation
• Type 1 error and type 2 error
• Type 1 error (conclude that average height in the beneficiary group is
higher than in non-beneficiary group when in fact it is not)
• Type 2 error (conclude that average height in the beneficiary group is no
different than in non-beneficiary group when in fact it is actually different)
• Likelihood of a type 1 error is called confidence level – usually set at 5
percent i.e. you would be 95 percent confident that program had an impact
• Many factors affect the likelihood of committing a type 2 error but sample
size is crucial- When a sample is large it is less likely that average height or
weight of children in the two groups is equal just by luck

Power calculation and errors in IE (Gertler et
al 2010)
• The statistical power of an impact evaluation is the probability that it will
detect a difference between the beneficiary and non-beneficiary groups
when in fact one exists. An impact evaluation has a high power if there is a
low risk of not detecting real program impacts, that is, of committing a
type II error.
• Under high power unlikely to be disappointed by results showing that the
program being evaluated has had no impact, when in reality it did have an
impact.
• From a policy perspective, underpowered impact evaluations are costly
• If you were to conclude that the program was not effective, even though it
was, policy makers would be likely to close down a good program.
• Carrying out power calculations could be crucial and relevant.

Steps in power calculations (Gertler et al
2010)
• Does the program create clusters?
• What is the outcome indicator?
• Do you aim to compare program impacts between subgroups?
• What is the minimum level of impact that would justify the
investment that has been made in the intervention?
• What is a reasonable level of power for the evaluation being
conducted?
• What are the baseline mean and variance of the outcome indicators?

Power calculation: Continued
• No clusters case- Intervention at the level at which impacts are observed (some treatments given
at school level and outcomes observed at the student level would comprise clusters)
• No clusters- take a random sample out of the entire population
• Identify the most important indicators that you want to evaluate
• If there are sub-groups (like SC/ST caste groups) then the sample size would be larger
• Then the sample size of the effect to be determined would be lager
• For an evaluation to identify a small impact, estimates of any difference in mean outcomes
between the treatment and comparison groups will need to be very precise, requiring a large
sample.
• Population baseline mean and variance
• Usual power of 80 percent is the norm
• Many statistical software can do the power calculations once these parameters are known
• Programs that create clusters have a different issue with power calculations

Power calculations with clusters
• Some programs assign benefits at the cluster level
• Here the principle is for sample size- number of clusters matters more
than number of households within clusters
• A sufficient number of clusters is required to test convincingly
whether a program has had an impact by comparing outcomes
• Compared to the steps before just add how variable are the outcomes
within clusters?
• Think within village incomes are same but different across villages-
adding an individual from another village would add more statistical
power
INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Example: test of difference of means in two
populations
• The equation for sample size is derived from the
equation for the statistical test
• In a t-test the equation for the test is
t = (x1 - x2) - (m1 - m2)
(s1
2 n+ s2
2 n)12
• The derived equation for sample size is
n = (z1-/2 + z1-b)2(s1
2 + s2
2)
(m1 - m2)2

What next after sample size?
• A Sampling Strategy
• Steps in sampling
• 1. Determine the population of interest (eg children under certain
age).
• 2. Identify a sampling frame- should coincide with population of
interest but sometimes it does not.
• 3. Draw as many units from the sampling frame as required by power
calculations.
• Choose the sampling method

Sampling method
• Probability sampling (PS) methods-most rigorous- they assign a well-
defined probability of each unit’s being drawn. The 3 main PS methods are:
• Random sampling. Every unit in the population has exactly the same
probability of being drawn.
• Stratified random sampling. The population is divided into groups (for
example, male and female) and random sampling is performed within each
group (essential for comparing impacts in sub groups)
• Cluster sampling. Units are grouped in clusters, and a random sample of
clusters is drawn, after which either all units in those clusters constitute the
sample or a number of units within the cluster are randomly drawn. This
means that each cluster has a well-defined probability of being selected,
and units within a selected cluster also have a well-defined probability of
being drawn.

Sampling method and impact evaluation
• Drawing a sample depends on the rules of eligibility in the program
• Usually interventions take place at cluster level
• Implies in these cases should go for cluster sampling
• Under any cost avoid non-probabilistic like purposive or convenience
sampling

IFPRI- Impact Surveys 1

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (12)

Similaire à IFPRI- Impact Surveys 1

Similaire à IFPRI- Impact Surveys 1 (20)

Plus de International Food Policy Research Institute- South Asia Office

Plus de International Food Policy Research Institute- South Asia Office (20)

Dernier

Dernier (17)

IFPRI- Impact Surveys 1