This document describes using ARIMA models to analyze the effects of unemployment news coverage and unemployment rates on average left-right political preferences in the Netherlands from 1990-2000. The results show that neither unemployment news coverage in NRC Handelsblad nor actual unemployment rates had a statistically significant effect on changing average political preferences over time based on ARIMA models that included these factors.
1. Turn left or right:
How the economy affects political preferences and
media coverage?
Multivariate ARIMA models
Assignment 3
Mark Boukes (markboukes@Hotmail.com)
5616298
1st semester 2010/2011
Dynamic Data Analysis
Lecturer: Dr. R. Vliegenthart
December 1, 2010
Communication Science (Research MSc)
Faculty of Social and Behavioural Sciences
University of Amsterdam
3. Introduction
In the previous assignment, I found a significant effect of the international financial crisis caused by
the second oil crisis in 1979/1980 on the political preferences Dutch citizens on a left-right scale. Being
inspired by the influence of economic factors on political preferences, I studied in this assignment what
the effect of unemployment is on political preferences of the Dutch population. Hollanders and
Vliegenthart (2009) showed in their research how news coverage that was negative about the economy,
led to a decreased consumer confidence. In this paper I would like to see if there is also an influence on
political preference. Soroka (2006) also found that increased negative economic news coverage leads to
more pessimistic expectations about the future of the economy. To study the effect of unemployment on
political preferences, I had the following research questions:
o Did the amount of articles about unemployment in NRC Handelsblad affect the average
political preference of Dutch citizens?
o Did developments in the unemployment rates of the Netherlands affect the average political
preference of Dutch citizens?
Method
To investigate which factors have an effect on the left-right preferences of Dutch citizens, I used a
dataset that contains information about this for a long time period. The NIPO Weeksurveys 1962-20001
contained for the period 1977-2000, 1.086.336 individual answers on the following question, Here you
see seven boxes between the words left and right. Could you indicate on this scale how left, right or in
between your political opinion lies? The observations were transformed in such a way that the mean
answer for every week was reported, because the answers were reported individually and aggregate level
data is needed to answer the research question,. This resulted in 1226 weekly items containing the value
for the average left-right preference of Dutch citizens, as I only study the period 1990-2000, 560 items
could be used.
To construct a variable containing information about the amount of attention paid to
unemployment in newspapers on different moments in time, a computer assisted content analysis was
conducted using the digital archive of LexisNexis. Articles were selected via the Boolean search term
werkloosheid OR werkeloosheid. The period that I analyzed was 1 January 1990 until 31 December
2000, as the variable indicating the mean left-right preference was measured until 2000 and
LexisNexis contains no data for the period before 1990. Only articles in NRC Handelsblad were
analyzed, as this newspaper is the only that contains data from 1990 on. Using other newspapers would
have led to a too short period. The search resulted in 7652 articles for the whole period. The number of
articles was aggregated, resulting in weekly visibility scores of unemployment in NRC Handelsblad.
1
found on https://easy.dans.knaw.nl/dms
1
4. The variable representing the unemployment rate was obtained via the website of Eurostat; also
for the period 1990-2000. Unemployment rate was measured as the percentage of the total labour force.
However, as this data was monthly and not weekly, the unemployment rate for intervening moments
were calculated by taking the mean of the week before and the next week measured.
To analyse the effects of those events, first an adequate ARIMA model is developed for the time
series of the average left-right preference, this was followed by adding the independent variables to the
ARIMA model, resulting in multivariate ARIMA models.
Results
I specify in this results section, how the ARIMA model for the average left-right preference was
created. Thereafter will the results be described of including information about news coverage and
unemployment rates into this ARIMA model, with the purpose of explaining political preferences in
causal terms. Three timeseries were used in the analyses, the average left-right political preference, the
amount of articles about unemployment and the unemployment rate in the Netherlands. Figure 1
displays the development of those variables in the period 1990-2000.
Figure 1. The development of average political preference, amount of ‘unemployment’ articles in NRC
Handelsblad and the unemployment rate in the Netherlands, between 1990 and 2000.
2
5. Left-right political preferences
To check if it was necessary to integrate the ARIMA model, the time series of the average left-right
political preferences was analyzed with three augmented Dickey-Fuller tests. The results of these tests
(see Table 1) indicate that the series had to be differenced, because the Dickey-Fuller test for random
walk could not be rejected. Therefore, the dependent variable needed to be differenced. The results of the
three augmented Dickey-Fuller tests of the differenced series all could reject the null hypothesis,
meaning that no random walk was present (also see Table 1). Therefore, the political preference time
series does not need to be differenced once more.
Table 1. The various augmented Dickey-Fuller tests for the average left-right political preferences
Augmented Dickey-Fuller test
Random walk without drift -0.125 *
Random walk with drift -11.750
Random walk with drift and trend -15.312
After integrating
Random walk without drift -38.053
Random walk with drift -38.019
Random walk with drift and trend -37.984
Note. * indicates the presence of a unit root.
The next step was predicting the data as good as possible by accounting for its own past, either with
autoregressive (AR) terms, moving average (MA) terms or both. This was done by inspecting the
autocorrelation (ACF) and partial autocorrelation functions (PACF) (see both graphs in Figure 2). The
ACF graph shows a clear spike at lag 1 and little to no significant correlations for other lags, while the
PACF graph displays a declining pattern for the first lags. This pattern is indicative for a process with a
moving average at lag 1. A ARIMA (0,1,1) model seems thus the right choice. This model was tested for
autocorrelation with the Ljung–Box Q test statistic and for the presence of conditional heteroscedasticity
with the Engle-Granger test. The insignificant result of the Ljung-Box Q-test for autocorrelation (20
lags) means that the null hypothesis of white noise cannot be rejected and that the absence of
autocorrelation can be assumed (Q= 15.37, p=.755). However, the Engle-Granger test for the presence
of conditional heteroscedasticity gives a significant result, indicating the presence of heteroscedasticity
(Q= 79.47, p<.001); nonetheless we paid no attention to this and hope to solve it later with ARCH and
GARCH models. The values of this ARIMA (0,1,1) model can be found in Table 2; just as all coming
models.
3
6. Figure 2. ACF and PCF for the differenced mean score of the average political preference.
Table 2. ARIMA model for the differenced mean score of the average political preference.
ARIMA (0,1,1) News coverage Unemployment rate Unemployment rate
Constant -.000 (.000) -.000 (.000) -.000 (.000) -.000 (.000)
Moving average (t - 1) -.838 (.023)* -.835 (.023)* -.838 (.023)* -.835 (.023)*
Unemployment coverage (t - 5) -.001 (.000) -.001 (.000)
Unemployment rate (t – 1) -.001 (.011) -.002 (.011)
Ljung-Box Q(20) residuals 15.37 14.81 15.56 14.77
Ljung-Box Q(20) residuals² 79.47 * 83.71* 79.30* 83.67*
AIC -1776.91 -1758.14 -1770.89 -1756.18
BIC -1763.84 -1740.75 -1753.47 -1734.43
Note. Unstandardized coefficients. Standard errors in parentheses; * p<.001
Now we built a model that properly accounts for its own past, I could go on with the next step:
assessing the impact of the amount of news coverage in NRC Handelsblad about unemployment on the
average political preference of Dutch citizens. As the effect of news coverage is expected to set in
within a time-span of 3 months, I considered lags ranging from 1 to 13. The cross-correlation function
for the amount of unemployment news coverage and the residuals of the ARIMA(0,1,1) model for
average political preference, indicate that the strongest association is present when news coverage is
lagged 5 weeks (r = -.086). The ARIMA(0,1,1) model which included the amount of unemployment
news coverage, did find similar results for the Ljung-Box Q-test (Q = 14.81, p =.787) and the Engle-
Granger test (Q = 83.71, p < .001); indicating the absence of autocorrelation and the presence of
conditional heteroscedasticity.
Including this variable as an independent variable in the original ARIMA model for average
political preference, indicates that the amount of ‘unemployment news coverage’ seems not to
influence the political preference of Dutch citizens; the unstandardized coefficient is -.001 (p = .113).
The Akaike Info Criterion (AIC) increases with 18.77 points (= -1776.91 ─ -1758.14), which also
4
7. indicates that the model did not get better. However, the model which includes the amount of
unemployment news coverage is better than the model without, according to the difference in log-
likelihood, which decreased with 8.39 points, while losing one degree of freedom (p < .01). Though
the model did explain variance in average political preference little better, I prefer the standard and
more parsimonious ARIMA(0,1,1) model, because of the insignificant effect of amount of
unemployment news coverage and the increase in AIC.
To check whether the real economy had more effect on the political preferences of citizens, I repeat the
process of including an independent variable, but this time with the unemployment rate. Again I
expected a potential effect to set in within three months (13 weeks). I analyzed the cross-correlation
function for this period for the unemployment rate and the residuals of the ARIMA(0,1,1) model for
average political preference. This indicated that the strongest association is present when the
unemployment rate is lagged 1 weeks (r= .063). Including this variable as an independent variable in
the original ARIMA model for the average political preference, indicates that the unemployment rate
is also not causing differences in the average political preferences; the unstandardized coefficient is
-.001 (p = .963). According to the AIC, did including the unemployment rate to the model not improve
model fit; this value increased with 6.02 points. The difference in log-likelihood was a decrease with
2.02 points (p = .16) while losing one degree of freedom; not significant and thus no indication that the
model fits better. Including the unemployment rate harms model fit thus, just like including the amount
of articles in NRC Handelsblad seems to do. The model which included the amount of unemployment
news coverage, did find similar results for the Ljung-Box Q-test (Q = 15.56, p =.744) and the Engle-
Granger test (Q = 79.30, p < .001); indicating the absence of autocorrelation and the presence of
conditional heteroscedasticity.
The final model that I tested was the ARIMA(0,1,1) model for the political preferences, which
included both the amount of unemployment coverage in NRC Handelsblad at lag 5 and the
unemployment rate as independent variables at lag 1 (unemployment rate had also the strongest
correlation at lag 1 with the residuals for the ARIMA(0,1,1) model which included the amount of news
coverage). In this way, a potential effect of news coverage could be controlled for the real world
circumstances of the economy. This model again found comparable results for the Ljung-Box Q-test
(Q = 14.77, p =.790) and the Engle-Granger test (Q = 83.67, p < .001); indicating the absence of
autocorrelation and the presence of conditional heteroscedasticity. Including both independent
variables, again and as could be expected, led to two insignificant effects: the amount of
unemployment news coverage (b = -.001, p = .111) and unemployment rate (b = -.002, p = .871). This
model also made the AIC increase, from -1776.91 to -1756.18; 20.75 points. The difference in log-
likelihood also did not point to a significantly better fitted model; a decrease of 8.37 poins while losing
5
8. two degrees of freedom (p = .015). Including both the amount of unemployment articles and the
unemployment rate in the ARIMA(0,1,1), thus does not improve model fit.
Conclusion
Because I found in the previous assignment an effect of the financial crisis in 1979/1980 on the
political preferences of people, I tried to find a comparable effect in this paper, by investigating
potential effects of news coverage and real world developments of unemployment. My main aim was
to study the influence news coverage about unemployment had on the average political preference on a
left-right scale of the Dutch population. The results make clear that such an effect seems not to exist;
changes in political preference are not caused by changes in the amount of political coverage. To see if
the average political preference was on the other hand affected by real world developments, I looked to
the unemployment rate as another independent variable. However, this also did not seem to have an
impact on the average political preference. To check if the oil crisis in 1979 and 1980 was an
exception as an economic factor that influenced the political preference, future research should try to
use other economic indicators as independent variables.
References
Hollanders, D., & Vliegenthart, R. (2009). The Influence of Negative Newspaper Coverage on
Consumer Confidence: The Dutch Case, CentER Discussion Paper Series (Vol. 2009). Tilburg:
University of Tilburg.
Soroka, S. N. (2006). Good news and bad news: Asymmetric responses to economic information.
Journal of Politics 68(2), 372-385.
6
9. Appendix 1: Do file
*Left right
drop if yrwk<199002
drop if yrwk>200051
* declare data to be time series
replace nr2 = nr2 + 898
tsset nr2, weekly
codebook leftright
codebook N_BREAK
*Missing values, leftright is average of the two points coming before and
after, articles is 0 as it means there were no articles about unemployment
replace leftright= (leftright[_n-1]+leftright[_n+1])/2 if leftright>= .
replace leftright= (leftright[_n-1]+leftright[_n+2])/2 if leftright>= .
replace N_BREAK = 0 if N_BREAK>= .
replace unumpl_rate = (unumpl_rate[_n-1]+unumpl_rate[_n+3])/2 if
unumpl_rate>= .
replace unumpl_rate = (unumpl_rate[_n-1]+unumpl_rate[_n+2])/2 if
unumpl_rate>= .
replace unumpl_rate = (unumpl_rate[_n-1]+unumpl_rate[_n+1])/2 if
unumpl_rate>= .
replace unumpl_rate = (unumpl_rate[_n-1]+unumpl_rate[_n+4])/2 if
unumpl_rate>= .
replace unumpl_rate = (unumpl_rate[_n-1]+unumpl_rate[_n+5])/2 if
unumpl_rate>= .
replace unumpl_rate = unumpl_rate[_n-1] if unumpl_rate>= .
codebook unumpl_rate leftright N_BREAK
*Model building
twoway (tsline leftright, lcolor(black))
twoway (tsline N_BREAK, lcolor(black))
twoway (tsline unumpl_rate, lcolor(black))
*with drift
dfuller leftright
*random walk
dfuller leftright, noconstant
*trend
dfuller leftright, trend
*not necessary, but just to check the data
dfuller N_BREAK
*random walk
dfuller N_BREAK, noconstant
*trend
dfuller N_BREAK, trend
dfuller unumpl_rate
*random walk
dfuller unumpl_rate, noconstant
*trend
dfuller unumpl_rate, trend
*As DF for random walk is not significant, I assume the data show a random
walk pattern and therefore it is necessary to integrate (differenciate) the
data
twoway (tsline d.leftright, lcolor(black))
i
10. twoway (tsline d.N_BREAK, lcolor(black))
twoway (tsline d.unumpl_rate, lcolor(black))
*with drift
dfuller d.leftright
*random walk
dfuller d.leftright, noconstant
*trend
dfuller d.leftright, trend
*not necessary, but just to check the data
dfuller d.N_BREAK
*random walk
dfuller d.N_BREAK, noconstant
*trend
dfuller d.N_BREAK, trend
*with drift
dfuller d.unumpl_rate
*random walk
dfuller d.unumpl_rate, noconstant
*trend
dfuller d.unumpl_rate, trend
*Building the ARIMA model for d.leftright
ac d.leftright
pac d.leftright
corrgram d.leftright
*The ACF graph shows a clear spike at lag 1 and little to non significant
correlations for other lags, while the PACF graph displays a declining
pattern for the first lags.
*A ARIMA (0,1,1) model is thus the right choice
arima d.leftright, ma(1)
estat ic
predict r, res
gen r_s= r*r
wntestq r, lags(20)
wntestq r_s, lags(20)
drop r r_s
*The Ljung-Box Q-test for autocorrelation (20 lags) on the residuals
indicates that the null hypothesis of white noise cannot be rejected, this
indicate the absence of autocorrelation (Q=xxxxx, p=xxxx). The Engle-
Granger test for the presence of conditional heteroscedasticity indicates
the presence of this as the test does reject the null hypothesis of having
white noise (Q=xxxxx, p=xxxxxx).
*Cross-correlation function which lag works best statistically
xcorr r d.N_BREAK, lags(13)
correlate r l.d.N_BREAK l2.d.N_BREAK l3.d.N_BREAK l4.d.N_BREAK l5.d.N_BREAK
l6.d.N_BREAK
correlate r l5.d.N_BREAK
*Strongest correlation at a lag of 5 weeks (one month)
arima d.leftright l5.d.N_BREAK, ma(1)
estat ic
predict r2, res
gen r2_s=r2*r2
wntestq r2, lags(20)
wntestq r2_s, lags(20)
xcorr r d.unumpl_rate, lags(13)
ii
11. drop r2 r2_s
*N_Break's effect is not significant
arima d.leftright, ma(1)
estat ic
predict r, res
gen r_s= r*r
wntestq r, lags(20)
wntestq r_s, lags(20)
xcorr r d.unumpl_rate, lags(13)
correlate r l.d.unumpl_rate l2.d.unumpl_rate l3.d.unumpl_rate
l4.d.unumpl_rate l5.d.unumpl_rate l6.d.unumpl_rate
*Strongest correlation at lag 1 and lag 2
drop r r_s
arima d.leftright l1.d.unumpl_rate, ma(1)
estat ic
predict r2, res
gen r2_s=r2*r2
wntestq r2, lags(20)
wntestq r2_s, lags(20)
drop r2 r2_s
arima d.leftright l5.d.N_BREAK l.d.unumpl_rate, ma(1)
estat ic
predict r2, res
gen r2_s=r2*r2
wntestq r2, lags(20)
wntestq r2_s, lags(20)
drop r2 r2_s
iii