Differences-in-Differences

Research Method for Political Science III
Di↵erences-in-Di↵erences
Jia Li Jaehyun Song
Kobe University
2016-07-27
Jia Li, Jaehyun Song (Kobe Univ.) Di↵-in-Di↵ 2016-07-27 0 / 38

Table of Contents
1 Review
2 Application
Fouirnaies and Mutlu-Eren 2015
3 Practice
Background
Graphical Explanation
Estimating Causal E↵ects Using Linear Regression
4 Standard Errors in Di↵-in-Di↵ Estimation
5 Synthetic Control Method

Review
Review of DID
When do we usually use DID estimation?
The treatment and control groups di↵er systematically
e.g. For job training program, if workers who took the training
are predominantly uneducated, we may ﬁnd an average earnings
of treatment group is lower than that of the control group.
Panel or repeated cross sectional data before and after the
experiment(e.g. program, policy) are available
The common trends assumption is satisﬁed

Application Fouirnaies and Mutlu-Eren 2015
English Bacon: Research Question
Research Question
Do government parties allocate more resources to local councils that
are controlled by their own party?
Copartisanit =
(
1 if majorityit 2 Gt
0 otherwise
i: Local council(2 (1, 2, . . . , 466))
t: Year(1992⇠2012)
G: Government party

English Bacon: Comparing the Two Groups
We are interested in comparing the Specific Grant(SG) allocated to
the local councils(Copartisanit = 1) and the others.
Identification Strategy
E[SGit|Copartisanit = 1] E[SGit|Copartisanit = 0]
Omitted Variable Bias
Economic growth
) Specific Grant #
) More votes to the prime minister’s party

English Bacon: Identiﬁcation Strategy
Identiﬁcation Strategy
ySG
i,t+k = 1Copartisanit + ↵i + t + ↵it + Xit + "i,t+k
ySG
i,t+k: SG per capital allocated to i at t + k(logged)
↵i: Fixed e↵ect (local councils)
t: Fixed e↵ect (time)
Xit: Control variables

English Bacon: Common-Trends Assumption
Including council-speciﬁc trends variables(↵it) can mitigate the
Common-Trends Assumption, but the assumption can still be violated
because of nonlinear trends.
New Identiﬁcation Strategy(Relaxing the Assumption)
Di↵erences-in-Di↵erences-in-Di↵erences Estimator
ySG
i,t+k yFG
i,t+k = 1Copartisanit + ↵i + t + ↵it + Xit + "i,t+k
yFG
i,t+k Formula Grant per capital allocated to i at t + k(logged)

English Bacon: Other Models
Case 1: Is the e↵ect larger before elections?
yi,t+k = ↵i + t + ↵it + 1Copartisanit + 2ElectYeari,t+k
+ 3(Copartisanit ⇥ ElectYeari,t+k) + "i,t+k
ElectYeari,t+k A dummy variable indicating whether there is a local
election in i at t + k.

Case 2: How do goverments strategically manipulate the timing of
grant allocation?
yi,t+k = ↵i + t + ↵it + 1Copartisanit + 2YearToElecti,t+k
+ 3(Copartisanit ⇥ YearToElecti,t+k) + "i,t+k
YearToElecti,t+k A variable counting the number of years to the
next local election in i at t + k.

Case 3: Is the e↵ect strongest in councils that provide
citizen-focused services and hold relatively infrequent elections?
yi,t+k = ↵i + t + ↵it + 1Copartisanit
+ 2(Copartisasnit ⇥ InfrequentElectionsi)
+ 3(Copartisanit ⇥ UpperTieri)
+ 4(Copartisanit ⇥ UpperTieri ⇥ InfrequentElectionsi)
+"i,t+k
InfrequentElectionsi A dummy variable indicating whether i holds
elections only once every four years or more often.
UpperTieri A dummy variable indicating whether the i refers to a
top-tier council

Case 4: Is the e↵ect stronger in “swing” councils?
yi,t+k = ↵i + t + ↵it + 1Copartisanit + 2Swingit
+ 3(Copartisanit ⇥ Swingit) + "i,t+k
Swingit A dummy variable that takes the value 1 if neither the
government nor the opposition held an absolute majority
of the seats in i before election t.

Practice Background
e-Vote in Kyoto
Vote using touch panel devices
NOT PC or cell phone
Some wards in Kyoto(city) adopted e-Vote in 2004(Higashiyama
ward) and 2008(Kamigyo ward)
Kyoto city has eleven wards.
(Unfortunately, the wards abolish the e-Voting.)

Practice Background
e-Vote in Kyoto
Figure: e-Vote Device
Source:
http://blogimg.goo.ne.jp/user_image/70/fc/e198dc314f386001a5c789d5d18fa059.jpg

Practice Background
e-Vote in Kyoto
Source: Wikipedia

Practice Background
Does e-Vote Make Democracy Great Again?
1 e-Vote may reduce voting costs(. . . ?)
2 e-Vote may reduce mistakes in ﬁlling ballots.
H1 e-Vote makes voters turnout higher.
H2 e-Vote makes spoilt votes reduce.
We can estimate its causal e↵ects using Di↵-in-Di↵.

Practice Graphical Explanation
Graphical Explanation: H1
E↵ect size: 0.0521
0.300.350.400.450.50
Year
VoterTurnout
2000 2004
Higashiyama
Fushimi
Counterfactual
Higashiyama
Does it meet “the parallel assumption”?

Collect data from other wards
0.300.350.400.450.50
Year
VoterTurnout
2000 2004
Wards except Higashiyama
Mean of the others

E↵ect size: 0.05345
0.300.350.400.450.50
Year
VoterTurnout
2000 2004
Higashiyama
The others
mean of the others
Counterfactual
Higashiyama

E↵ect size: -0.01538832
0.0000.0050.0100.0150.0200.0250.030
Year
SpoiltVotes
2000 2004
Higashiyama
Fushimi
Counterfactual
Higashiyama

Check the parallel assumption
0.0000.0050.0100.0150.0200.0250.030
Year
SpoiltVotes
2000 2004
Wards except Higashiyama
Mean of the others

E↵ect size: -0.01647571
0.0000.0050.0100.0150.0200.0250.030
Year
SpoiltVotes
2000 2004
Higashiyama
The others
mean of the others
Counterfactual
Higashiyama

Practice Estimating Causal E↵ects Using Linear Regression
Prepare
Let’s Practice!
Please launch R, and load a package and the dataset
library(dplyr) # Thanks, Hadley!
dfURL <- "http://jaysong.net/RMPS3/eVoteKyoto.csv"
DD_df <- read.csv(dfURL)

Data Structure
ID ID
Ward J Ward name(Japanese)
Ward E Ward name(English)
year Year(2000⇠2016)
trend Trend Indicator(1⇠4)
eVote Treatment Variable(e-Vote)
turnout Voter turnout
spoilt Spoilt votes

Hypothesis 1: Comparing the Two Points
Turnoutwt = ↵ + eVotewt +
FushimiX
j=Kamigyo
jWardj + Year2004
df1 <- DD_df %>% filter(year <= 2004)
H1Model1 <- lm(turnout ~ eVote + as.factor(WardID) +
as.factor(year), data = df1)
summary(H1Model1)

Estimate Std. Error t value Pr>|t|
Intercept 0.483075 0.002754 175.393 < 2e-16
eVote 0.053450 0.005508 9.703 4.60e-06
is exactly same to the result of graphical explanation.

Hypothesis 1: Comparing All the Points
Of course, we can use all the data.
FushimiX
j=Kamigyo
jWardj +
2016X
k=2004
kYeark
as.factor(year), data = DD_df)
summary(H1Model2)
------------------------------
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.485430 0.004691 103.490 < 2e-16
eVote 0.024672 0.006248 3.949 0.000319

Hypothesis 1: Considering trend e↵ect
How about to consider trend e↵ect?
FushimiX
j=Kamigyo
jWardj +
2014X
k=2004
kYeark +
FushimiX
j=Kamigyo
j(Wardj ⇥ Trend(t))
as.factor(year) +
as.factor(WardID) * trend,
data = DD_df)
summary(H1Model3)

(Intercept) 0.4818506 0.0060225 80.008 < 2e-16
eVote 0.0240453 0.0054012 4.452 0.000116

Spoiltwt = ↵ + eVotewt +
FushimiX
j=Kamigyo
jWardj + kYear2004
H2Model1 <- lm(spoilt ~ eVote + as.factor(WardID) +
as.factor(year), data = df1)
summary(H2Model1)

Estimate Std. Error t value Pr>|t|
Intercept 0.0126155 0.0005573 22.636 3.04e-09
eVote -0.0164757 0.0011146 -14.782 1.28e-07
is also exactly same to the result of graphical explanation.

Hypothesis 2: Comparing with All the Points
Of course, we can still use all the data.
as.factor(year), data = DD_df)
summary(H2Model2)
------------------------------
(Intercept) 1.255e-02 9.919e-04 12.654 2.19e-15
eVote -1.884e-02 1.321e-03 -14.257 < 2e-16

How about to consider trend e↵ect?
Spoiltwt = ↵ + eVotewt +
FushimiX
j=Kamigyo
jWardj +
2014X
k=2004
kYeark +
FushimiX
j=Kamigyo
j(Wardj ⇥ Trend(t))
as.factor(year) +
as.factor(WardID) * trend,
data = DD_df)
summary(H2Model3)

(Intercept) 1.244e-02 1.416e-03 8.782 1.15e-09
eVote -1.890e-02 1.270e-03 -14.881 4.12e-15

Compared the models
H Two Points All with Trend
H1 Coef. 0.0535 0.0247 0.0240
S.E. (0.0055) (0.0062) (0.0054)
H2 Coef. -0.0165 -0.0188 -0.0189
S.E. (0.0011) (0.0013) (0.0013)
) The estimates of H1 are less stable than that of H2

Visualization of Di↵-in-Di↵(All the Points)0.250.300.350.400.450.50
Year
VoterTurnout
2000 2004 2008 2012 2016
Adoptaion e-Vote
(Higashiyama)
Adoptaion e-Vote
(Kamikyo)
Abolishing e-Vote
(Both)
Kamikyo
Higashiyama

Visualization of Di↵-in-Di↵(All the Points)0.0000.0050.0100.0150.0200.0250.030
Year
SpoiltVotes
2000 2004 2008 2012 2016
Adoptaion e-Vote
(Higashiyama)
Adoptaion e-Vote
(Kamikyo)
Abolishing e-Vote
(Both)
Kamikyo
Higashiyama

Standard Errors in Di↵-in-Di↵ Estimation
How to Calculate S.Es
Clustered standard errors can help us.
These can be easily calculated using multiwayvcov and lmtest
packages. (We can conduct Di↵-in-Di↵ with adjusted standard errors using R package, wfe,
but it does not work on my PC.)
Let’s try to calculate clustered standard errors of Hypothesis 2(spoilt
votes).

Calculate Clustered Standard Errors: Code
# Load required packages
library(multiwayvcov)
library(lmtest)
# Calculate the clustered var-cov matrix
H2Model3_VCOV <- cluster.vcov(H2Model3, ~WardID)
# PROFIT!
coeftest(H2Model3, H2Model3_VCOV)

Calculate Clustered Standard Errors: Result
without clustering
t test of coefficients:
eVote -1.890e-02 1.270-03 -14.881 < 2.2e-16
with clustering
t test of coefficients:
eVote -1.890e-02 2.0491e-03 -9.2235 < 2.2e-16

Synthetic Control Method
Introduction
Objective: to evaluate the impact of a treatment implemented
at the aggregate level (e.g. country, region) on one or few units
using a small number of controls to build the counterfactual
Synthetic control methods
use panel data to build the weighted average of non-treated
units that best reproduces characteristics of the treated unit
over time
impact of the treatment is measured by a simple di↵erence after
treatment between the treated and a combination of
comparison units(synthetic control)

Setup
Units: j = 1, 2, . . . , J + 1 where j = 1 is the treated and
j = 2, . . . , J + 1 are controls (potential comparisons)
Time frame:split t = 1, . . . , T1 into two periods,pretreatment
t = 1, . . . , T0 and post-treatment t = T0 + 1, . . . , T1
Potential and observed outcomes for the treated unit are
(Y 0
1t, Y 1
1t) where
Y1t =
(
Y 0
1t t = 1, . . . , T0
Y 1
1t t = T0 + 1, . . . , T1
Our objective is to estimate ↵1t = Y 1
1t Y 0
1t

Setup,continued
Let X1 be a k ⇥ 1 vector of pre-intervention characteristics of
the treated units
Let X0 is a k ⇥ J vector of the same variables for the
comparison units
Choose weights that minimize
kX
m=1
vm(X1m X0mW)2
where X1m is the value of the m-th variable for the treated,vm is
a weight that reﬂects the relative importance that we assign to
the m-th variable

Setup,continued
Choose W⇤
= (w⇤
2, . . . , wJ + 1⇤
) 2 [0, 1]J
,adding to 1 to minimize
distance in pretreatment characteristics between treated and
weighted average of controls
Treatment e↵ect estimated by the simple di↵erence
ˆ↵1t = Y 1
1t
PJ+1
j=2 w⇤
j Yjt for t = T0 + 1, . . . , T1

Application,German Uniﬁcation
This paper aims to examine the e↵ect of the 1990 German
reuniﬁcation on per capita GDP in West Germany
the set of comparisons is a sample of OECD countries

Predictors of Economic Growth

West Germany and synthetic West Germany

Per Capita GDP gap

Placebo Studies
In-time Placebo:apply this method to dates when the
intervention didn’t occur
In-space Placebo: resign the intervention to a comparison unit

Robustness Checks
Test the sensitivity of the main results to changes in the country
weights.
Incorporate the leave-one-out estimates

Differences-in-Differences

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Differences-in-Differences

Similaire à Differences-in-Differences (20)

Plus de Jaehyun Song

Plus de Jaehyun Song (8)

Dernier

Dernier (20)

Differences-in-Differences