SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 1/43
DEM 7263 Fall 2015 - Spatially
Autoregressive Models 1
Corey S. Sparks, Ph.D.
September 9, 2015
Introduction to Spatial Regression
Models
Up until now, we have been concerned with describing the structure of spatial data through correlational,
and the methods of exploratory spatial data analysis (http://rpubs.com/corey_sparks/105700).
Through ESDA, we examined data for patterns and using the Moran I and Local Moran I statistics, we
examined clustering of variables. Now we consider regression models for continuous outcomes. We begin
with a review of the Ordinary Least Squares model for a continuous outcome.
OLS Model
The basic OLS model is an attempt to estimate the effect of an independent variable(s) on the value of a
dependent variable. This is written as:
where y is the dependent variable that we want to model, x is the independent variable we think has an
association with y, is the model intercept, or grand mean of y, when x = 0, and is the slope
parameter that defines the strength of the linear relationship between x and y. e is the error in the model
for y that is unaccounted for by the values of x and the grand mean . The average, or expected value of
y is : , which is the linear mean function for y, conditional on x, and this gives us the
customary linear regression plot:
set.seed(1234)
x<- rnorm(100, 10, 5)
beta0<-1
beta1<-1.5
y<-beta0+beta1*x+rnorm(100, 0, 5)
plot(x, y)
abline(coef = coef(lm(y~x)), lwd=1.5)
= + ∗ +yi
β0
β1
xi
ei
β0
β1
β0
E[y|x] = + ∗β0
β1
xi
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 2/43
summary(lm(y~x))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.446620 1.0879494 1.329676 1.867119e-01
## x 1.473915 0.1037759 14.202863 1.585002e-25
Where, the line shows
We assume that the errors, are independent, Normally distributed and homoskdastic, with
variances .
This is the simple model with one predictor. We can easily add more predictors to the equation and
rewrite it:
So, now the mean of y is modeled with multiple x variables. We can write this relationship more compactly
using matrix notation:
Where Y is now a vector of observations of our dependent variable, X is a matrix of
independent variables, with the first column being all 1’s and e is the vector of errors for each
observation.
E[y|x] = + ∗β0
β1
xi
∼ N(0, )ei
σ2
σ2
y = + ∗ +β0
∑k
βk
xik
ei
Y = β + eX
′
n ∗ 1 n ∗ k
n ∗ 1
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 3/43
In matrices this looks like:
The residuals are uncorrelated, with covariance matrix =
To estimate the coefficients, we use the customary OLS estimator
this is the estimator that minimizes the residual sum of squares:
or
We can inspect the properties of the estimates by examining the residuals, or of the model. Since we
assume the data are normal, a quantile-quantile (Q-Q) plot of the residuals against the expected quantile
of the standard normal distribution should be a straight line. Formal tests of normality can also be used.
fit<-lm(y~x)
qqnorm(rstudent(fit))
qqline(rstudent(fit))
y =
⎡
⎣
⎢
⎢
⎢
⎢
y1
y2
⋮
yn
⎤
⎦
⎥
⎥
⎥
⎥
β =
⎡
⎣
⎢
⎢
⎢
⎢
β0
β1
⋮
βk
⎤
⎦
⎥
⎥
⎥
⎥
x =
⎡
⎣
⎢
⎢
⎢
⎢
1
1
1
1
x1,1
x2,1
⋮
xn,1
x1,2
x1,2
⋮
xn,2
…
…
⋮
…
x1,k
x1,k
⋮
xn,k
⎤
⎦
⎥
⎥
⎥
⎥
e =
⎡
⎣
⎢
⎢
⎢
⎢
e1
e2
⋮
en
⎤
⎦
⎥
⎥
⎥
⎥
Σ
Σ = I = ∗ =σ2
σ2
⎡
⎣
⎢
⎢
⎢
⎢
1
0
0
0
0
1
⋮
0
0
0
⋮
0
…
…
…
…
0
0
⋮
1
⎤
⎦
⎥
⎥
⎥
⎥
⎡
⎣
⎢
⎢
⎢
⎢
σ2
0
0
0
0
σ2
⋮
0
0
0
⋮
0
…
…
…
…
0
0
⋮
σ2
⎤
⎦
⎥
⎥
⎥
⎥
β β = ( X ( Y)X
′
)
−1
X
′
(Y − β (Y − β)X
′
)
′
X
′
(Y − (Y − )Y
̂
)
′
Y
̂
ei
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 4/43
shapiro.test(resid(fit))
##
## Shapiro-Wilk normality test
##
## data: resid(fit)
## W = 0.98878, p-value = 0.5677
ad.test(resid(fit))
##
## Anderson-Darling normality test
##
## data: resid(fit)
## A = 0.39859, p-value = 0.3593
lillie.test(resid(fit))
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 5/43
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: resid(fit)
## D = 0.052017, p-value = 0.7268
We may also inspect the association between , or more appropriately the studentized/standardized
residuals, and the predictors and the dependent variable. If we see evidence of association, then
homoskedasticity is a poor assumption
par(mfrow=c(2,2))
plot(fit)
par(mfrow=c(1,1))
Model-data agreement
Do we (meaning our data) meet the statistical assumptions of our analytical models?
Always ask this of any analysis you do, if your model is wrong, your inference will also be wrong.
ei
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 6/43
Since spatial data often display correlations amongst closely located observations (autocorrelation), we
should probably test for autocorrelation in the model residuals, as that would violate the assumptions of
the OLS model.
One method for doing this is to calculate the value of Moran’s I for the OLS residuals.
library(spdep)
library(maptools)
library(RColorBrewer)
setwd("~/Google Drive/dem7263/data/")
dat<-readShapePoly("SA_classdata.shp", proj4string=CRS("+proj=utm +zone=14 +nort
h"))
#Make a rook style weight matrix
sanb<-poly2nb(dat, queen=F)
summary(sanb)
## Neighbour list object:
## Number of regions: 235
## Number of nonzero links: 1106
## Percentage nonzero weights: 2.002716
## Average number of links: 4.706383
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9
## 4 10 30 62 66 34 24 3 2
## 4 least connected regions:
## 61 82 147 205 with 1 link
## 2 most connected regions:
## 31 55 with 9 links
salw<-nb2listw(sanb, style="W")
fit2<-lm(I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + log(MEDHHINC), data=da
t )
dat$resid<-rstudent(fit2)
spplot(dat, "resid",at=quantile(dat$resid), col.regions=brewer.pal(n=5, "RdBu"), m
ain="Residuals from OLS Fit of Crime Rate")
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 7/43
lm.morantest(fit2, listw = salw, resfun = rstudent)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = I(viol3yr/acs_poptot) ~ pfemhh + hwy +
## p5yrinmig + log(MEDHHINC), data = dat)
## weights: salw
##
## Moran I statistic standard deviate = 0.75475, p-value = 0.2252
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.021326432 -0.011406176 0.001880845
Which, in this case, there appears to be no clustering in the residuals, since the observed value of Moran’s
I is .021, with a z-test of 0.75, p= .225.
Extending the OLS model to accommodate spatial structure
If we now assume we measure our Y and X’s at specific spatial locations (s), so we have Y(s) and X(s).
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 8/43
In most analysis, the spatial location (i.e. the county or census tract) only serves to link X and Y so we can
collect our data on them, and in the subsequent analysis this spatial information is ignored that explicitly
considers the spatial relationships between the variables or the locations.
In fact, even though we measure Y(s) and X(s) what we end up analyzing X and Y, and apply the ordinary
regression methods on these data to understand the effects of X on Y.
Moreover, we could move them around in space (as long as we keep the observations together with )
and still get the same results. Such analyses have been called a-spatial. This is the kind of regression
model you are used to fitting, where we ignore any information on the locations of the observations
themselves.
However, we can extend the simple regression case to include the information on (s) and incorporate it
into our models explicitly, so they are no longer a-spatial.
There are several methods by which to incorporate the (s) locations into our models, there are several
alternatives to use on this problem:
The structured linear mixed (multi-level) model, or GLMM (generalized linear mixed model)
Spatial filtering of observations
Spatially autoregressive models
Geographically weighted regression
We will first deal with the case of the spatially autoregressive model, or SAR model, as its structure is just
a modification of the OLS model from above.
Spatially autoregressive models
We saw in the normal OLS model that some of the basic assumptions of the model are that the: 1) model
residuals are distributed as iid standard normal random variates 2) and that they have common (and
constant, meaning homoskedastic) unit variance.
Spatial data, however present a series of problems to the standard OLS regression model. These
problems are typically seen as various representations of spatial structure or dependence within the data.
The spatial structure of the data can introduce spatial dependence into both the outcome, the predictors
and the model residuals.
This can be observed as neighboring observations, both with high (or low) values (positive autocorrelation)
for either the dependent variable, the model predictors or the model residuals. We can also observe
situations where areas with high values can be surrounded by areas with low values (negative
autocorrelation).
Since the standard OLS model assumes the residuals (and the outcomes themselves) are uncorrelated, as
previous stated, the autocorrelation inherent to most spatial data introduces factors that violate the iid
distributional assumptions for the residuals, and could violate the assumption of common variance for the
OLS residuals. To account for the expected spatial association in the data, we would like a model that
accounts for the spatial structure of the data. One such way of doing this is by allowing there to be
correlation between residuals in our model, or to be correlation in the dependent variable.
yi
xi
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 9/43
We are familiar with the concept of autoregression amongst neighboring observations. This concept is
that a particular observation is a linear combination of its neighboring values. This autoregression
introduces dependence into the data. Instead of specifying the autoregression structure directly, we
introduce spatial autocorrelation through a global autocorrelation coefficient and a spatial proximity
measure.
There are 2 basic forms of the spatial autoregressive model: the spatial lag and the spatial error models.
Both of these models build on the basic OLS regression model: $ Y = dots X ’ + e$
Where Y is the dependent variable, X is the matrix of independent variables, is the vector of regression
parameters to be estimated from the data, and e are the model residuals, which are assumed to be
distributed as a Gaussian random variable with mean 0 and constant variance-covariance matrix .
The spatial lag model
The spatial lag model introduces autocorrelation into the regression model by lagging the dependent
variables themselves, much like in a time-series approach .
The model is specified as:
where is the autoregressive coefficient, which tells us how strong the resemblance is, on average,
between and it’s neighbors. The matrix ** W** is the spatial weight matrix, describing the spatial
network structure of the observations, like we described in the ESDA lecture.
In the lag model, we are specifying the spatial component on the dependent variable. This leads to a
spatial filtering of the variable, where they are averaged over the surrounding neighborhood defined in W,
called the spatially lagged variable.
The specification that is used most often is a spatially filtered Y variable that can then be regressed on X,
which can directly be seen in a re-expression of the OLS model as:
where the direct effect of the spatial lagging of the dependent variable is seen.
To estimate these models we can use either GeoDa or R in R we use the spdep package, and the
lagsarlm() function
The lag model is:
fit.lag<-lagsarlm(I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + log(MEDHHIN
C), data=dat, listw = salw)
summary(fit.lag, Nagelkerke=T)
β
Σ
Y = ρWY + β + eX
′
ρ
Yi
Y = ρWY + β + eX
′
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 10/43
##
## Call:
## lagsarlm(formula = I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig +
## log(MEDHHINC), data = dat, listw = salw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0635446 -0.0147641 -0.0036721 0.0090372 0.3252902
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.3136449 0.0924307 3.3933 0.0006906
## pfemhh 0.1913535 0.0336049 5.6942 1.239e-08
## hwy 0.0075802 0.0056013 1.3533 0.1759604
## p5yrinmig 0.0794330 0.0202592 3.9208 8.824e-05
## log(MEDHHINC) -0.0337148 0.0082612 -4.0811 4.482e-05
##
## Rho: 0.034756, LR test value: 0.18517, p-value: 0.66697
## Asymptotic standard error: 0.082235
## z-value: 0.42264, p-value: 0.67256
## Wald statistic: 0.17862, p-value: 0.67256
##
## Log likelihood: 441.5604 for lag model
## ML residual variance (sigma squared): 0.0013657, (sigma: 0.036955)
## Nagelkerke pseudo-R-squared: 0.36486
## Number of observations: 235
## Number of parameters estimated: 7
## AIC: -869.12, (AIC for lm: -870.94)
## LM test for residual autocorrelation
## test value: 0.4691, p-value: 0.4934
We see that is estimated to be .034, and the likelihood ratio test shows that this is not significantly
different from 0.
The spatial error model
The spatial error model says that the autocorrelation is not in the outcome itself, but instead, any
autocorrelation is attributable to there being missing spatial covariates in the data. If these spatially
patterned covariates could be measures, the tne autocorrelation would be 0. This model is written:
This model, in effect, controls for the nuisance of correlated errors in the data that are attributable to an
inherently spatial process, or to spatial autocorrelation in the measurement errors of the measured and
possibly unmeasured variables in the model.
This model is estimated in R using errorsarlm() in the spdep library
ρ
Y = β + eX
′
e = λWe + v
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 11/43
fit.err<-errorsarlm(I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + log(MEDHHIN
C), data=dat, listw = salw)
summary(fit.err, Nagelkerke=T)
##
## Call:
## errorsarlm(formula = I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig +
## log(MEDHHINC), data = dat, listw = salw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0650577 -0.0141501 -0.0034659 0.0092839 0.3241926
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.3138785 0.0922430 3.4027 0.0006671
## pfemhh 0.1971225 0.0340765 5.7847 7.264e-09
## hwy 0.0073140 0.0057043 1.2822 0.1997762
## p5yrinmig 0.0781620 0.0205345 3.8064 0.0001410
## log(MEDHHINC) -0.0337316 0.0082489 -4.0893 4.328e-05
##
## Lambda: 0.070725, LR test value: 0.53188, p-value: 0.46582
## Asymptotic standard error: 0.094212
## z-value: 0.7507, p-value: 0.45283
## Wald statistic: 0.56355, p-value: 0.45283
##
## Log likelihood: 441.7338 for error model
## ML residual variance (sigma squared): 0.0013625, (sigma: 0.036912)
## Nagelkerke pseudo-R-squared: 0.3658
## Number of observations: 235
## Number of parameters estimated: 7
## AIC: -869.47, (AIC for lm: -870.94)
We see = .071, with a p-value of .465, suggesting again that, in this case, there is no autocorrelation in
the model residuals.
We can examine the relative fits of each model by extracting the AIC values from each:
AIC(fit.lag)
## [1] -869.1208
AIC(fit.err)
## [1] -869.4675
λ
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 12/43
AIC(fit2)
## [1] -870.9356
Which, while slightly lower than the OLS model, show little evidence of favoring the spatial regression
models in this case.
Examination of Model Specification
To some degree, both of the SAR specifications allow us to model spatial dependence in the data. The
primary difference between them is where we model said dependence.
The lag model says that the dependence affects the dependent variable only, we can liken this to a
diffusion scenario, where your neighbors have a diffusive effect on you.
The error model says that dependence affects the residuals only. We can liken this to the missing spatially
dependent covariate situation, where, if only we could measure another really important spatially
associated predictor, we could account for the spatial dependence. But alas, we cannot, and we instead
model dependence in our errors.
These are inherently two completely different ways to think about specifying a model, and we should really
make our decision based upon how we think our process of interest operates.
That being said, this way of thinking isn’t necessarily popular among practitioners. Most practitioners want
the best fitting model, ‘nuff said. So methods have been developed that test for alternate model
specifications, to see which kind of model best summarizes the observed variation in the dependent
variable and the spatial dependence.
These are a set of so-called Lagrange Multiplier (econometrician’s jargon for a score test
(https://en.wikipedia.org/wiki/Score_test)) test. These tests compare the model fits from the OLS, spatial
error, and spatial lag models using the method of the score test.
For those who don’t remember, the score test is a test based on the relative change in the first derivative
of the likelihood function around the maximum likelihood. The particular thing here that is affecting the
value of this derivative is the autoregressive parameter, or . In the OLS model or = 0 (so both the
lag and error models simplify to OLS), but as this parameter changes, so does the likelihood for the model,
hence why the derivative of the likelihood function is used. This is all related to how the estimation
routines estimate the value of or .
Using the Lagrange Multiplier Test (LMT)
In general, you fit the OLS model to your dependent variable, then submit the OLS model fit to the LMT
testing procedure.
Then you look to see which model (spatial error, or spatial lag) has the highest value for the test.
Enter the uncertainty…
So how much bigger, you might say?
ρ λ ρ λ
ρ λ
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 13/43
Well, drastically bigger, if the LMT for the error model is 2500 and the LMT for the lag model is 2480, this
is NOT A BIG DIFFERENCE, only about 1%. If you see a LMT for the error model of 2500 and a LMT for
the lag model of 250, THIS IS A BIG DIFFERENCE.
So what if you don’t see a BIG DIFFERENCE, HOW DO YOU DECIDE WHICH MODEL TO USE???
Well, you could think more, but who has time for that.
The econometricians have thought up a “better” LMT test, the so-called robust LMT, robust to what I’m
not sure, but it is said that it can settle such problems of a “not so big difference” between the lag and
error model specifications.
So what do you do? In general, think about your problem before you run your analysis, should this fail you,
proceed with using the LMT, if this is inconclusive, look at the robust LMT, and choose the model which
has the larger value for this test.
More Data Examples:
San Antonio, TX mortality data
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 14/43
#Spatial Regression Models 1
setwd("~/Google Drive//dem7263/data")
#, proj4string=CRS("+proj=utm zone=14")
dat<-readShapePoly("SA_classdata.shp")
dat<-dat[which(dat$acs_poptot>100),]
#Create a good representative set of neighbor types
sa.nb6<-knearneigh(coordinates(dat), k=6)
sa.nb6<-knn2nb(sa.nb6)
sa.wt6<-nb2listw(sa.nb6, style="W")
sa.nb5<-knearneigh(coordinates(dat), k=5)
sa.nb5<-knn2nb(sa.nb5)
sa.wt5<-nb2listw(sa.nb5, style="W")
sa.nb4<-knearneigh(coordinates(dat), k=4)
sa.nb4<-knn2nb(sa.nb4)
sa.wt4<-nb2listw(sa.nb4, style="W")
sa.nb3<-knearneigh(coordinates(dat), k=3)
sa.nb3<-knn2nb(sa.nb3)
sa.wt3<-nb2listw(sa.nb3,style="W")
sa.nb2<-knearneigh(coordinates(dat), k=2)
sa.nb2<-knn2nb(sa.nb2)
sa.wt2<-nb2listw(sa.nb2,style="W")
sa.nbr<-poly2nb(dat, queen=F)
sa.wtr<-nb2listw(sa.nbr, zero.policy=T)
sa.nbq<-poly2nb(dat, queen=T)
sa.wtq<-nb2listw(sa.nbr, style="W", zero.policy=T)
sa.nbd<-dnearneigh(coordinates(dat), d1=0, d2=10000)
sa.wtd<-nb2listw(sa.nbd, zero.policy=T)
#create a mortality rate, 3 year average
dat$mort3<-apply(dat@data[, c("deaths09", "deaths10", "deaths11")],1,mean)
dat$mortrate<-1000*dat$mort3/dat$acs_poptot
#just a
hist(dat$mortrate)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 15/43
#do some basic regression models, without spatial structure
fit.1<-lm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_poptot))+scale(dis
sim)+scale(ppop65plus), data=dat)
summary(fit.1)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 16/43
##
## Call:
## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot))
+
## scale(dissim) + scale(ppop65plus), data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96044 -0.27804 -0.00673 0.26359 2.18006
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.723e-16 3.120e-02 0.000 1.0000
## scale(ppersonspo) 1.215e-01 5.047e-02 2.407 0.0169 *
## scale(I(viol3yr/acs_poptot)) 2.287e-01 3.841e-02 5.953 9.8e-09 ***
## scale(dissim) 8.467e-02 4.817e-02 1.758 0.0801 .
## scale(ppop65plus) 7.240e-01 3.594e-02 20.146 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4773 on 229 degrees of freedom
## Multiple R-squared: 0.7761, Adjusted R-squared: 0.7722
## F-statistic: 198.4 on 4 and 229 DF, p-value: < 2.2e-16
vif(fit.1)
## scale(ppersonspo) scale(I(viol3yr/acs_poptot))
## 2.605367 1.508595
## scale(dissim) scale(ppop65plus)
## 2.372665 1.320878
par(mfrow=c(2,2))
plot(fit.1)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 17/43
par(mfrow=c(1,1))
#this is a test for constant variance
bptest(fit.1) #looks like have heteroskedasticity
##
## studentized Breusch-Pagan test
##
## data: fit.1
## BP = 51.088, df = 4, p-value = 2.14e-10
#extract studentized residuals from the fit, and examine them
dat$residfit1<-rstudent(fit.1)
cols<-brewer.pal(5,"RdBu")
spplot(dat,"residfit1", at=quantile(dat$residfit1), col.regions=cols, main="Residu
als from Model fit of Mortality Rate")
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 18/43
Chi and Zhu (http://link.springer.com/article/10.1007/s11113-007-9051-8#page-1) suggest using a wide
array of neighbor specifications, then picking the one that maximizes the autocorrelation coefficient. So,
here I emulate their results:
#test for residual autocorrelation
resi<-c(lm.morantest(fit.1, listw=sa.wt2)$estimate[1],
lm.morantest(fit.1, listw=sa.wt3)$estimate[1],
lm.morantest(fit.1, listw=sa.wt4)$estimate[1],
lm.morantest(fit.1, listw=sa.wt5)$estimate[1],
lm.morantest(fit.1, listw=sa.wt6)$estimate[1],
lm.morantest(fit.1, listw=sa.wtd, zero.policy=T)$estimate[1],
lm.morantest(fit.1, listw=sa.wtq)$estimate[1],
lm.morantest(fit.1, listw=sa.wtr)$estimate[1])
plot(resi, type="l")
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 19/43
lm.morantest(fit.1, listw=sa.wt2)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt2
##
## Moran I statistic standard deviate = -1.3282, p-value = 0.908
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## -0.089262137 -0.010642737 0.003503515
lm.morantest(fit.1, listw=sa.wt3)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 20/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt3
##
## Moran I statistic standard deviate = 0.10133, p-value = 0.4596
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## -0.005775111 -0.010724844 0.002386190
lm.morantest(fit.1, listw=sa.wt4)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt4
##
## Moran I statistic standard deviate = 0.90538, p-value = 0.1826
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.02812192 -0.01050301 0.00182003
lm.morantest(fit.1, listw=sa.wt5)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 21/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt5
##
## Moran I statistic standard deviate = 1.3996, p-value = 0.08082
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.04315932 -0.01029146 0.00145856
lm.morantest(fit.1, listw=sa.wt6)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt6
##
## Moran I statistic standard deviate = 1.7095, p-value = 0.04368
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.049203016 -0.010078442 0.001202519
lm.morantest(fit.1, listw=sa.wtd, zero.policy=T)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 22/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wtd
##
## Moran I statistic standard deviate = 2.6709, p-value = 0.003782
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 1.901642e-02 -6.929932e-03 9.436939e-05
lm.morantest(fit.1, listw=sa.wtq)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wtq
##
## Moran I statistic standard deviate = 0.65308, p-value = 0.2569
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.017905792 -0.010601820 0.001905428
lm.morantest(fit.1, listw=sa.wtr)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 23/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wtr
##
## Moran I statistic standard deviate = 0.65308, p-value = 0.2569
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.017905792 -0.010601820 0.001905428
#looks like we have minimal autocorrelation in our residuals, but the distance bas
ed
#weight does show significant autocorrelation
#Let's look at the local autocorrelation in our residuals
#get the values of I
dat$lmfit1<-localmoran(dat$mortrate, sa.wt5, zero.policy=T)[,1]
brks<-classIntervals(dat$lmfit1, n=5, style="quantile")
spplot(dat, "lmfit1", at=brks$brks
, col.regions=brewer.pal(5, "RdBu"), main="Local Moran Plot of Mortality Rate")
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 24/43
#Now we fit the spatial lag model
#The lag mode is fit with lagsarlm() in the spdep library
#we basically specify the same model as in the lm() fit above
#But we need to specify the spatial weight matrix and the type
#of lag model to fit
fit.lag<-lagsarlm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_poptot))+s
cale(dissim)+scale(ppop65plus), data=dat, listw=sa.wt2, type="lag")
summary(fit.1)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 25/43
##
## Call:
## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot))
+
## scale(dissim) + scale(ppop65plus), data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96044 -0.27804 -0.00673 0.26359 2.18006
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.723e-16 3.120e-02 0.000 1.0000
## scale(ppersonspo) 1.215e-01 5.047e-02 2.407 0.0169 *
## scale(I(viol3yr/acs_poptot)) 2.287e-01 3.841e-02 5.953 9.8e-09 ***
## scale(dissim) 8.467e-02 4.817e-02 1.758 0.0801 .
## scale(ppop65plus) 7.240e-01 3.594e-02 20.146 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4773 on 229 degrees of freedom
## Multiple R-squared: 0.7761, Adjusted R-squared: 0.7722
## F-statistic: 198.4 on 4 and 229 DF, p-value: < 2.2e-16
summary(fit.lag, Nagelkerke=T)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 26/43
##
## Call:
## lagsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_po
ptot)) +
## scale(dissim) + scale(ppop65plus), data = dat, listw = sa.wt2,
## type = "lag")
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.91953751 -0.28320690 0.00039592 0.25188870 2.28912610
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.0035113 0.0307209 -0.1143 0.9090
## scale(ppersonspo) 0.1047111 0.0509073 2.0569 0.0397
## scale(I(viol3yr/acs_poptot)) 0.2235749 0.0379480 5.8916 3.825e-09
## scale(dissim) 0.0758260 0.0476155 1.5925 0.1113
## scale(ppop65plus) 0.7019724 0.0376957 18.6221 < 2.2e-16
##
## Rho: 0.066138, LR test value: 2.1119, p-value: 0.14616
## Asymptotic standard error: 0.043789
## z-value: 1.5104, p-value: 0.13095
## Wald statistic: 2.2813, p-value: 0.13095
##
## Log likelihood: -155.394 for lag model
## ML residual variance (sigma squared): 0.22064, (sigma: 0.46973)
## Nagelkerke pseudo-R-squared: 0.77808
## Number of observations: 234
## Number of parameters estimated: 7
## AIC: 324.79, (AIC for lm: 324.9)
## LM test for residual autocorrelation
## test value: 9.0458, p-value: 0.002633
bptest.sarlm(fit.lag)
##
## studentized Breusch-Pagan test
##
## data:
## BP = 53.263, df = 4, p-value = 7.506e-11
#robust SE's for the spatial model
library(sandwich)
lm.target <- lm(fit.lag$tary ~ fit.lag$tarX - 1)
coeftest(lm.target, vcov.=vcovHC(lm.target, type="HC0"))
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 27/43
##
## t test of coefficients:
##
## Estimate Std. Error t value
## fit.lag$tarXx(Intercept) -0.0035113 0.0307070 -0.1143
## fit.lag$tarXxscale(ppersonspo) 0.1047111 0.0790642 1.3244
## fit.lag$tarXxscale(I(viol3yr/acs_poptot)) 0.2235749 0.1140534 1.9603
## fit.lag$tarXxscale(dissim) 0.0758260 0.0498978 1.5196
## fit.lag$tarXxscale(ppop65plus) 0.7019724 0.0617274 11.3721
## Pr(>|t|)
## fit.lag$tarXx(Intercept) 0.90906
## fit.lag$tarXxscale(ppersonspo) 0.18670
## fit.lag$tarXxscale(I(viol3yr/acs_poptot)) 0.05118 .
## fit.lag$tarXxscale(dissim) 0.12998
## fit.lag$tarXxscale(ppop65plus) < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Next we fit the spatial error model
fit.err<-errorsarlm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_popto
t))+scale(dissim)+scale(ppop65plus), data=dat, listw=sa.wt2)
summary(fit.err, Nagelkerke=T)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 28/43
##
## Call:
## errorsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/ac
s_poptot)) +
## scale(dissim) + scale(ppop65plus), data = dat, listw = sa.wt2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9126219 -0.2695111 -0.0030349 0.2680765 2.1121586
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.0029538 0.0277141 0.1066 0.91512
## scale(ppersonspo) 0.1202146 0.0484444 2.4815 0.01308
## scale(I(viol3yr/acs_poptot)) 0.2303560 0.0371271 6.2045 5.486e-10
## scale(dissim) 0.0870377 0.0458697 1.8975 0.05776
## scale(ppop65plus) 0.7350826 0.0340110 21.6131 < 2.2e-16
##
## Lambda: -0.10644, LR test value: 2.2391, p-value: 0.13456
## Asymptotic standard error: 0.071547
## z-value: -1.4877, p-value: 0.13684
## Wald statistic: 2.2132, p-value: 0.13684
##
## Log likelihood: -155.3304 for error model
## ML residual variance (sigma squared): 0.22001, (sigma: 0.46905)
## Nagelkerke pseudo-R-squared: 0.7782
## Number of observations: 234
## Number of parameters estimated: 7
## AIC: 324.66, (AIC for lm: 324.9)
#As a pretty good indicator of which model is best, look at the AIC of each
AIC(fit.1)
## [1] 324.8999
AIC(fit.lag)
## [1] 324.788
AIC(fit.err)
## [1] 324.6608
Larger data example US counties
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 29/43
This example shows a lot more in terms of spatial effects.
spdat<-readShapePoly("~/Google Drive/dem7263/data/usdata_mort.shp")
#Create a good representative set of neighbor types
us.nb6<-knearneigh(coordinates(spdat), k=6)
us.nb6<-knn2nb(us.nb6)
us.wt6<-nb2listw(us.nb6, style="W")
us.nb5<-knearneigh(coordinates(spdat), k=5)
us.nb5<-knn2nb(us.nb5)
us.wt5<-nb2listw(us.nb5, style="W")
us.nb4<-knearneigh(coordinates(spdat), k=4)
us.nb4<-knn2nb(us.nb4)
us.wt4<-nb2listw(us.nb4, style="W")
us.nb3<-knearneigh(coordinates(spdat), k=3)
us.nb3<-knn2nb(us.nb3)
us.wt3<-nb2listw(us.nb3,style="W")
us.nb2<-knearneigh(coordinates(spdat), k=2)
us.nb2<-knn2nb(us.nb2)
us.wt2<-nb2listw(us.nb2,style="W")
us.nbr<-poly2nb(spdat, queen=F)
us.wtr<-nb2listw(us.nbr, zero.policy=T)
us.nbq<-poly2nb(spdat, queen=T)
us.wtq<-nb2listw(us.nbr, style="W", zero.policy=T)
hist(spdat$mortrate)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 30/43
spplot(spdat,"mortrate", at=quantile(spdat$mortrate), col.regions=brewer.pal(n=5,
"Reds"), main="Spatial Distribution of US Mortality Rate")
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 31/43
#do some basic regression models, without spatial structure
fit.1.us<-lm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scal
e(phisp)+factor(RUCC), spdat)
summary(fit.1.us)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 32/43
##
## Call:
## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + factor(RUCC), data = spdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5810 -0.4280 0.0216 0.4534 4.2606
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.16408 0.06218 -2.639 0.00836 **
## scale(ppersonspo) 0.60632 0.01789 33.893 < 2e-16 ***
## scale(p65plus) -0.04085 0.01582 -2.582 0.00987 **
## scale(pblack_1) 0.10730 0.01641 6.538 7.29e-11 ***
## scale(phisp) -0.28734 0.01484 -19.358 < 2e-16 ***
## factor(RUCC)1 0.41534 0.08731 4.757 2.05e-06 ***
## factor(RUCC)2 0.29579 0.07226 4.094 4.36e-05 ***
## factor(RUCC)3 0.11800 0.07985 1.478 0.13955
## factor(RUCC)4 0.23900 0.08845 2.702 0.00693 **
## factor(RUCC)5 0.13588 0.09502 1.430 0.15282
## factor(RUCC)6 0.41615 0.06901 6.030 1.83e-09 ***
## factor(RUCC)7 0.17107 0.07097 2.411 0.01599 *
## factor(RUCC)8 0.11620 0.07880 1.475 0.14040
## factor(RUCC)9 -0.20337 0.07654 -2.657 0.00793 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7412 on 3053 degrees of freedom
## Multiple R-squared: 0.453, Adjusted R-squared: 0.4507
## F-statistic: 194.5 on 13 and 3053 DF, p-value: < 2.2e-16
vif(fit.1.us)
## GVIF Df GVIF^(1/(2*Df))
## scale(ppersonspo) 1.786082 1 1.336444
## scale(p65plus) 1.397321 1 1.182083
## scale(pblack_1) 1.503301 1 1.226092
## scale(phisp) 1.229648 1 1.108895
## factor(RUCC) 1.724759 9 1.030746
par(mfrow=c(2,2))
plot(fit.1.us)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 33/43
par(mfrow=c(1,1))
#this is a test for constant variance
bptest(fit.1.us) #looks like have heteroskedasticity
##
## studentized Breusch-Pagan test
##
## data: fit.1.us
## BP = 249.87, df = 13, p-value < 2.2e-16
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 34/43
#extract studentized residuals from the fit, and examine them
spdat$residfit1<-rstudent(fit.1.us)
cols<-brewer.pal(5,"RdBu")
spplot(spdat,"residfit1", at=quantile(spdat$residfit1), col.regions=cols, main="Re
siduals from Model fit of US Mortality Rate")
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 35/43
#test for residual autocorrelation
resi<-c(lm.morantest(fit.1.us, listw=us.wt2)$estimate[1],
lm.morantest(fit.1.us, listw=us.wt3)$estimate[1],
lm.morantest(fit.1.us, listw=us.wt4)$estimate[1],
lm.morantest(fit.1.us, listw=us.wt5)$estimate[1],
lm.morantest(fit.1.us, listw=us.wt6)$estimate[1],
lm.morantest(fit.1.us, listw=us.wtq,zero.policy=T)$estimate[1],
lm.morantest(fit.1.us, listw=us.wtr,zero.policy=T)$estimate[1])
plot(resi, type="l")
lm.morantest(fit.1.us, listw=us.wt2)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 36/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC),
## data = spdat)
## weights: us.wt2
##
## Moran I statistic standard deviate = 23.756, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.3965422192 -0.0018662504 0.0002812589
lm.morantest(fit.1.us, listw=us.wt3)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC),
## data = spdat)
## weights: us.wt3
##
## Moran I statistic standard deviate = 27.862, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.3879364703 -0.0018143292 0.0001956793
lm.morantest(fit.1.us, listw=us.wt4)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 37/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC),
## data = spdat)
## weights: us.wt4
##
## Moran I statistic standard deviate = 31.107, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.3798405687 -0.0017734270 0.0001504971
lm.morantest(fit.1.us, listw=us.wt5)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC),
## data = spdat)
## weights: us.wt5
##
## Moran I statistic standard deviate = 33.076, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.3625191247 -0.0017199894 0.0001212718
lm.morantest(fit.1.us, listw=us.wt6)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 38/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC),
## data = spdat)
## weights: us.wt6
##
## Moran I statistic standard deviate = 35.916, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.3600763582 -0.0016700396 0.0001014426
lm.morantest(fit.1.us, listw=us.wtq, zero.policy=T)
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC),
## data = spdat)
## weights: us.wtq
##
## Moran I statistic standard deviate = 32.728, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.3693002497 -0.0016900532 0.0001284917
lm.morantest(fit.1.us, listw=us.wtr, zero.policy=T)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 39/43
##
## Global Moran's I for regression residuals
##
## data:
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC),
## data = spdat)
## weights: us.wtr
##
## Moran I statistic standard deviate = 32.728, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran's I Expectation Variance
## 0.3693002497 -0.0016900532 0.0001284917
#Now we fit the spatial lag model
#The lag mode is fit with lagsarlm() in the spdep library
#we basically specify the same model as in the lm() fit above
#But we need to specify the spatial weight matrix and the type
#of lag model to fit
fit.lag.us<-lagsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblac
k_1)+scale(phisp)+factor(RUCC), spdat, listw=us.wt2, type="lag")
summary(fit.lag.us, Nagelkerke=T)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 40/43
##
## Call:
## lagsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + factor(RUCC), data = spdat,
## listw = us.wt2, type = "lag")
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.435066 -0.365396 0.016386 0.379149 4.181996
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.0601089 0.0539296 -1.1146 0.2650300
## scale(ppersonspo) 0.4383725 0.0166361 26.3507 < 2.2e-16
## scale(p65plus) -0.0067939 0.0137105 -0.4955 0.6202305
## scale(pblack_1) 0.0555812 0.0144317 3.8513 0.0001175
## scale(phisp) -0.1891732 0.0132557 -14.2711 < 2.2e-16
## factor(RUCC)1 0.2866229 0.0756754 3.7875 0.0001522
## factor(RUCC)2 0.1402669 0.0627174 2.2365 0.0253196
## factor(RUCC)3 0.0146097 0.0692250 0.2110 0.8328515
## factor(RUCC)4 0.1496221 0.0766327 1.9525 0.0508838
## factor(RUCC)5 0.0759998 0.0823307 0.9231 0.3559531
## factor(RUCC)6 0.2510359 0.0599005 4.1909 2.779e-05
## factor(RUCC)7 0.0760537 0.0615150 1.2363 0.2163308
## factor(RUCC)8 0.0063996 0.0682837 0.0937 0.9253308
## factor(RUCC)9 -0.2506134 0.0662793 -3.7812 0.0001561
##
## Rho: 0.39892, LR test value: 679.88, p-value: < 2.22e-16
## Asymptotic standard error: 0.014734
## z-value: 27.075, p-value: < 2.22e-16
## Wald statistic: 733.08, p-value: < 2.22e-16
##
## Log likelihood: -3086.33 for lag model
## ML residual variance (sigma squared): 0.4118, (sigma: 0.64172)
## Nagelkerke pseudo-R-squared: 0.56174
## Number of observations: 3067
## Number of parameters estimated: 16
## AIC: 6204.7, (AIC for lm: 6882.5)
## LM test for residual autocorrelation
## test value: 33.636, p-value: 6.6459e-09
bptest.sarlm(fit.lag.us)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 41/43
##
## studentized Breusch-Pagan test
##
## data:
## BP = 238.13, df = 13, p-value < 2.2e-16
#Robust SE's
lm.target.us <- lm(fit.lag.us$tary ~ fit.lag.us$tarX - 1)
coeftest(lm.target.us, vcov.=vcovHC(lm.target.us, type="HC0"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## fit.lag.us$tarXx(Intercept) -0.0601089 0.0477044 -1.2600 0.2077555
## fit.lag.us$tarXxscale(ppersonspo) 0.4383725 0.0250784 17.4801 < 2.2e-16
## fit.lag.us$tarXxscale(p65plus) -0.0067939 0.0176425 -0.3851 0.7002021
## fit.lag.us$tarXxscale(pblack_1) 0.0555812 0.0186951 2.9730 0.0029718
## fit.lag.us$tarXxscale(phisp) -0.1891732 0.0168362 -11.2361 < 2.2e-16
## fit.lag.us$tarXxfactor(RUCC)1 0.2866229 0.0597309 4.7986 1.675e-06
## fit.lag.us$tarXxfactor(RUCC)2 0.1402669 0.0501894 2.7948 0.0052264
## fit.lag.us$tarXxfactor(RUCC)3 0.0146097 0.0593415 0.2462 0.8055470
## fit.lag.us$tarXxfactor(RUCC)4 0.1496221 0.0651111 2.2980 0.0216319
## fit.lag.us$tarXxfactor(RUCC)5 0.0759998 0.0803115 0.9463 0.3440640
## fit.lag.us$tarXxfactor(RUCC)6 0.2510359 0.0526992 4.7636 1.991e-06
## fit.lag.us$tarXxfactor(RUCC)7 0.0760537 0.0576485 1.3193 0.1871791
## fit.lag.us$tarXxfactor(RUCC)8 0.0063996 0.0699315 0.0915 0.9270913
## fit.lag.us$tarXxfactor(RUCC)9 -0.2506134 0.0664019 -3.7742 0.0001636
##
## fit.lag.us$tarXx(Intercept)
## fit.lag.us$tarXxscale(ppersonspo) ***
## fit.lag.us$tarXxscale(p65plus)
## fit.lag.us$tarXxscale(pblack_1) **
## fit.lag.us$tarXxscale(phisp) ***
## fit.lag.us$tarXxfactor(RUCC)1 ***
## fit.lag.us$tarXxfactor(RUCC)2 **
## fit.lag.us$tarXxfactor(RUCC)3
## fit.lag.us$tarXxfactor(RUCC)4 *
## fit.lag.us$tarXxfactor(RUCC)5
## fit.lag.us$tarXxfactor(RUCC)6 ***
## fit.lag.us$tarXxfactor(RUCC)7
## fit.lag.us$tarXxfactor(RUCC)8
## fit.lag.us$tarXxfactor(RUCC)9 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 42/43
#Next we fit the spatial error model
fit.err.us<-errorsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pbla
ck_1)+scale(phisp)+factor(RUCC), spdat, listw=us.wt2)
summary(fit.err.us, Nagelkerke=T)
##
## Call:
## errorsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + factor(RUCC), data = spdat,
## listw = us.wt2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.246243 -0.363446 0.018518 0.378347 4.269437
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.1519258 0.0718058 -2.1158 0.0343630
## scale(ppersonspo) 0.4956221 0.0192362 25.7651 < 2.2e-16
## scale(p65plus) -0.0050401 0.0163231 -0.3088 0.7574975
## scale(pblack_1) 0.1454472 0.0200302 7.2614 3.830e-13
## scale(phisp) -0.2201678 0.0193200 -11.3959 < 2.2e-16
## factor(RUCC)1 0.2983426 0.0802941 3.7156 0.0002027
## factor(RUCC)2 0.1978474 0.0802418 2.4656 0.0136768
## factor(RUCC)3 0.1136176 0.0837459 1.3567 0.1748783
## factor(RUCC)4 0.2271767 0.0877477 2.5890 0.0096261
## factor(RUCC)5 0.2166439 0.0959520 2.2578 0.0239559
## factor(RUCC)6 0.3282910 0.0763724 4.2986 1.719e-05
## factor(RUCC)7 0.1870157 0.0796173 2.3489 0.0188272
## factor(RUCC)8 0.0798636 0.0828415 0.9641 0.3350191
## factor(RUCC)9 -0.1286435 0.0843255 -1.5256 0.1271197
##
## Lambda: 0.43168, LR test value: 582.46, p-value: < 2.22e-16
## Asymptotic standard error: 0.016177
## z-value: 26.685, p-value: < 2.22e-16
## Wald statistic: 712.11, p-value: < 2.22e-16
##
## Log likelihood: -3135.038 for error model
## ML residual variance (sigma squared): 0.42027, (sigma: 0.64829)
## Nagelkerke pseudo-R-squared: 0.5476
## Number of observations: 3067
## Number of parameters estimated: 16
## AIC: 6302.1, (AIC for lm: 6882.5)
#As a pretty good indicator of which model is best, look at the AIC of each
AIC(fit.1)
10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1
file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 43/43
## [1] 324.8999
AIC(fit.lag.us)
## [1] 6204.659
AIC(fit.err.us)
## [1] 6302.075

Contenu connexe

Tendances

Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data scienceBrad Klingenberg
 
Tbs910 regression models
Tbs910 regression modelsTbs910 regression models
Tbs910 regression modelsStephen Ong
 
Presentation on regression analysis
Presentation on regression analysisPresentation on regression analysis
Presentation on regression analysisSujeet Singh
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regressionnaveedaliabad
 
Chap12 multiple regression
Chap12 multiple regressionChap12 multiple regression
Chap12 multiple regressionJudianto Nugroho
 
Eigenvalues for HIV-1 dynamic model with two delays
Eigenvalues for HIV-1 dynamic model with two delaysEigenvalues for HIV-1 dynamic model with two delays
Eigenvalues for HIV-1 dynamic model with two delaysIOSR Journals
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear modelRahul Rockers
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm Sammer Qader
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep diveabulyomon
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysisRabin BK
 
20150404 rm - autocorrelation
20150404   rm - autocorrelation20150404   rm - autocorrelation
20150404 rm - autocorrelationQatar University
 

Tendances (20)

Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Tbs910 regression models
Tbs910 regression modelsTbs910 regression models
Tbs910 regression models
 
04 regression
04 regression04 regression
04 regression
 
Presentation on regression analysis
Presentation on regression analysisPresentation on regression analysis
Presentation on regression analysis
 
Polynomial regression
Polynomial regressionPolynomial regression
Polynomial regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Chap12 multiple regression
Chap12 multiple regressionChap12 multiple regression
Chap12 multiple regression
 
Eigenvalues for HIV-1 dynamic model with two delays
Eigenvalues for HIV-1 dynamic model with two delaysEigenvalues for HIV-1 dynamic model with two delays
Eigenvalues for HIV-1 dynamic model with two delays
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm
 
Regression: A skin-deep dive
Regression: A skin-deep diveRegression: A skin-deep dive
Regression: A skin-deep dive
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Malhotra17
Malhotra17Malhotra17
Malhotra17
 
Regression
RegressionRegression
Regression
 
20150404 rm - autocorrelation
20150404   rm - autocorrelation20150404   rm - autocorrelation
20150404 rm - autocorrelation
 
SEM
SEMSEM
SEM
 
Linear regression
Linear regressionLinear regression
Linear regression
 

Similaire à Spatially Autoregressive Models for Regression Analysis

Demography 7263 fall 2015 spatially autoregressive models 2
Demography 7263 fall 2015 spatially autoregressive models 2Demography 7263 fall 2015 spatially autoregressive models 2
Demography 7263 fall 2015 spatially autoregressive models 2Corey Sparks
 
Regression kriging
Regression krigingRegression kriging
Regression krigingFAO
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011Adi Ali
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysisRaman Kannan
 
2.3 the simple regression model
2.3 the simple regression model2.3 the simple regression model
2.3 the simple regression modelRegmi Milan
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsA Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsDawn Cook
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsAnirudha si
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Rabby Bhatt
 
Stats ca report_18180485
Stats ca report_18180485Stats ca report_18180485
Stats ca report_18180485sarthakkhare3
 

Similaire à Spatially Autoregressive Models for Regression Analysis (20)

Demography 7263 fall 2015 spatially autoregressive models 2
Demography 7263 fall 2015 spatially autoregressive models 2Demography 7263 fall 2015 spatially autoregressive models 2
Demography 7263 fall 2015 spatially autoregressive models 2
 
Regression kriging
Regression krigingRegression kriging
Regression kriging
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011
 
StatsModelling
StatsModellingStatsModelling
StatsModelling
 
Lecture6 xing
Lecture6 xingLecture6 xing
Lecture6 xing
 
1607.01152.pdf
1607.01152.pdf1607.01152.pdf
1607.01152.pdf
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Correlation and regression in r
Correlation and regression in rCorrelation and regression in r
Correlation and regression in r
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
2.3 the simple regression model
2.3 the simple regression model2.3 the simple regression model
2.3 the simple regression model
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares ProblemsA Robust Method Based On LOVO Functions For Solving Least Squares Problems
A Robust Method Based On LOVO Functions For Solving Least Squares Problems
 
Multinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationshipsMultinomial logisticregression basicrelationships
Multinomial logisticregression basicrelationships
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02Mc0079 computer based optimization methods--phpapp02
Mc0079 computer based optimization methods--phpapp02
 
report
reportreport
report
 
Stats ca report_18180485
Stats ca report_18180485Stats ca report_18180485
Stats ca report_18180485
 

Plus de Corey Sparks

Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sCorey Sparks
 
Spatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDCSpatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDCCorey Sparks
 
Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013
Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013
Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013Corey Sparks
 
Sparks and Valencia PAA 2014 session107
Sparks and Valencia PAA 2014 session107Sparks and Valencia PAA 2014 session107
Sparks and Valencia PAA 2014 session107Corey Sparks
 
Infant mortality paper
Infant mortality paperInfant mortality paper
Infant mortality paperCorey Sparks
 
Campbell sparkspaa12
Campbell sparkspaa12Campbell sparkspaa12
Campbell sparkspaa12Corey Sparks
 
R meet up slides.pptx
R meet up slides.pptxR meet up slides.pptx
R meet up slides.pptxCorey Sparks
 
San Antonio Food Insecurity Assessment
San Antonio Food Insecurity AssessmentSan Antonio Food Insecurity Assessment
San Antonio Food Insecurity AssessmentCorey Sparks
 
A socio ecological model of injury mortality in Texas using Bayesian modesl
A socio ecological model of injury mortality in Texas using Bayesian modeslA socio ecological model of injury mortality in Texas using Bayesian modesl
A socio ecological model of injury mortality in Texas using Bayesian modeslCorey Sparks
 

Plus de Corey Sparks (9)

Dem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM'sDem 7263 fall 2015 Spatial GLM's
Dem 7263 fall 2015 Spatial GLM's
 
Spatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDCSpatial statistics presentation Texas A&M Census RDC
Spatial statistics presentation Texas A&M Census RDC
 
Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013
Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013
Sparks & Sparks Spatiotemporal persistence of residential segregation SSSA 2013
 
Sparks and Valencia PAA 2014 session107
Sparks and Valencia PAA 2014 session107Sparks and Valencia PAA 2014 session107
Sparks and Valencia PAA 2014 session107
 
Infant mortality paper
Infant mortality paperInfant mortality paper
Infant mortality paper
 
Campbell sparkspaa12
Campbell sparkspaa12Campbell sparkspaa12
Campbell sparkspaa12
 
R meet up slides.pptx
R meet up slides.pptxR meet up slides.pptx
R meet up slides.pptx
 
San Antonio Food Insecurity Assessment
San Antonio Food Insecurity AssessmentSan Antonio Food Insecurity Assessment
San Antonio Food Insecurity Assessment
 
A socio ecological model of injury mortality in Texas using Bayesian modesl
A socio ecological model of injury mortality in Texas using Bayesian modeslA socio ecological model of injury mortality in Texas using Bayesian modesl
A socio ecological model of injury mortality in Texas using Bayesian modesl
 

Dernier

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 

Dernier (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 

Spatially Autoregressive Models for Regression Analysis

  • 1. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 1/43 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 Corey S. Sparks, Ph.D. September 9, 2015 Introduction to Spatial Regression Models Up until now, we have been concerned with describing the structure of spatial data through correlational, and the methods of exploratory spatial data analysis (http://rpubs.com/corey_sparks/105700). Through ESDA, we examined data for patterns and using the Moran I and Local Moran I statistics, we examined clustering of variables. Now we consider regression models for continuous outcomes. We begin with a review of the Ordinary Least Squares model for a continuous outcome. OLS Model The basic OLS model is an attempt to estimate the effect of an independent variable(s) on the value of a dependent variable. This is written as: where y is the dependent variable that we want to model, x is the independent variable we think has an association with y, is the model intercept, or grand mean of y, when x = 0, and is the slope parameter that defines the strength of the linear relationship between x and y. e is the error in the model for y that is unaccounted for by the values of x and the grand mean . The average, or expected value of y is : , which is the linear mean function for y, conditional on x, and this gives us the customary linear regression plot: set.seed(1234) x<- rnorm(100, 10, 5) beta0<-1 beta1<-1.5 y<-beta0+beta1*x+rnorm(100, 0, 5) plot(x, y) abline(coef = coef(lm(y~x)), lwd=1.5) = + ∗ +yi β0 β1 xi ei β0 β1 β0 E[y|x] = + ∗β0 β1 xi
  • 2. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 2/43 summary(lm(y~x))$coef ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.446620 1.0879494 1.329676 1.867119e-01 ## x 1.473915 0.1037759 14.202863 1.585002e-25 Where, the line shows We assume that the errors, are independent, Normally distributed and homoskdastic, with variances . This is the simple model with one predictor. We can easily add more predictors to the equation and rewrite it: So, now the mean of y is modeled with multiple x variables. We can write this relationship more compactly using matrix notation: Where Y is now a vector of observations of our dependent variable, X is a matrix of independent variables, with the first column being all 1’s and e is the vector of errors for each observation. E[y|x] = + ∗β0 β1 xi ∼ N(0, )ei σ2 σ2 y = + ∗ +β0 ∑k βk xik ei Y = β + eX ′ n ∗ 1 n ∗ k n ∗ 1
  • 3. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 3/43 In matrices this looks like: The residuals are uncorrelated, with covariance matrix = To estimate the coefficients, we use the customary OLS estimator this is the estimator that minimizes the residual sum of squares: or We can inspect the properties of the estimates by examining the residuals, or of the model. Since we assume the data are normal, a quantile-quantile (Q-Q) plot of the residuals against the expected quantile of the standard normal distribution should be a straight line. Formal tests of normality can also be used. fit<-lm(y~x) qqnorm(rstudent(fit)) qqline(rstudent(fit)) y = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ y1 y2 ⋮ yn ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ β = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ β0 β1 ⋮ βk ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ x = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ 1 1 1 1 x1,1 x2,1 ⋮ xn,1 x1,2 x1,2 ⋮ xn,2 … … ⋮ … x1,k x1,k ⋮ xn,k ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ e = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ e1 e2 ⋮ en ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ Σ Σ = I = ∗ =σ2 σ2 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ 1 0 0 0 0 1 ⋮ 0 0 0 ⋮ 0 … … … … 0 0 ⋮ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ σ2 0 0 0 0 σ2 ⋮ 0 0 0 ⋮ 0 … … … … 0 0 ⋮ σ2 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ β β = ( X ( Y)X ′ ) −1 X ′ (Y − β (Y − β)X ′ ) ′ X ′ (Y − (Y − )Y ̂ ) ′ Y ̂ ei
  • 4. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 4/43 shapiro.test(resid(fit)) ## ## Shapiro-Wilk normality test ## ## data: resid(fit) ## W = 0.98878, p-value = 0.5677 ad.test(resid(fit)) ## ## Anderson-Darling normality test ## ## data: resid(fit) ## A = 0.39859, p-value = 0.3593 lillie.test(resid(fit))
  • 5. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 5/43 ## ## Lilliefors (Kolmogorov-Smirnov) normality test ## ## data: resid(fit) ## D = 0.052017, p-value = 0.7268 We may also inspect the association between , or more appropriately the studentized/standardized residuals, and the predictors and the dependent variable. If we see evidence of association, then homoskedasticity is a poor assumption par(mfrow=c(2,2)) plot(fit) par(mfrow=c(1,1)) Model-data agreement Do we (meaning our data) meet the statistical assumptions of our analytical models? Always ask this of any analysis you do, if your model is wrong, your inference will also be wrong. ei
  • 6. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 6/43 Since spatial data often display correlations amongst closely located observations (autocorrelation), we should probably test for autocorrelation in the model residuals, as that would violate the assumptions of the OLS model. One method for doing this is to calculate the value of Moran’s I for the OLS residuals. library(spdep) library(maptools) library(RColorBrewer) setwd("~/Google Drive/dem7263/data/") dat<-readShapePoly("SA_classdata.shp", proj4string=CRS("+proj=utm +zone=14 +nort h")) #Make a rook style weight matrix sanb<-poly2nb(dat, queen=F) summary(sanb) ## Neighbour list object: ## Number of regions: 235 ## Number of nonzero links: 1106 ## Percentage nonzero weights: 2.002716 ## Average number of links: 4.706383 ## Link number distribution: ## ## 1 2 3 4 5 6 7 8 9 ## 4 10 30 62 66 34 24 3 2 ## 4 least connected regions: ## 61 82 147 205 with 1 link ## 2 most connected regions: ## 31 55 with 9 links salw<-nb2listw(sanb, style="W") fit2<-lm(I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + log(MEDHHINC), data=da t ) dat$resid<-rstudent(fit2) spplot(dat, "resid",at=quantile(dat$resid), col.regions=brewer.pal(n=5, "RdBu"), m ain="Residuals from OLS Fit of Crime Rate")
  • 7. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 7/43 lm.morantest(fit2, listw = salw, resfun = rstudent) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = I(viol3yr/acs_poptot) ~ pfemhh + hwy + ## p5yrinmig + log(MEDHHINC), data = dat) ## weights: salw ## ## Moran I statistic standard deviate = 0.75475, p-value = 0.2252 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.021326432 -0.011406176 0.001880845 Which, in this case, there appears to be no clustering in the residuals, since the observed value of Moran’s I is .021, with a z-test of 0.75, p= .225. Extending the OLS model to accommodate spatial structure If we now assume we measure our Y and X’s at specific spatial locations (s), so we have Y(s) and X(s).
  • 8. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 8/43 In most analysis, the spatial location (i.e. the county or census tract) only serves to link X and Y so we can collect our data on them, and in the subsequent analysis this spatial information is ignored that explicitly considers the spatial relationships between the variables or the locations. In fact, even though we measure Y(s) and X(s) what we end up analyzing X and Y, and apply the ordinary regression methods on these data to understand the effects of X on Y. Moreover, we could move them around in space (as long as we keep the observations together with ) and still get the same results. Such analyses have been called a-spatial. This is the kind of regression model you are used to fitting, where we ignore any information on the locations of the observations themselves. However, we can extend the simple regression case to include the information on (s) and incorporate it into our models explicitly, so they are no longer a-spatial. There are several methods by which to incorporate the (s) locations into our models, there are several alternatives to use on this problem: The structured linear mixed (multi-level) model, or GLMM (generalized linear mixed model) Spatial filtering of observations Spatially autoregressive models Geographically weighted regression We will first deal with the case of the spatially autoregressive model, or SAR model, as its structure is just a modification of the OLS model from above. Spatially autoregressive models We saw in the normal OLS model that some of the basic assumptions of the model are that the: 1) model residuals are distributed as iid standard normal random variates 2) and that they have common (and constant, meaning homoskedastic) unit variance. Spatial data, however present a series of problems to the standard OLS regression model. These problems are typically seen as various representations of spatial structure or dependence within the data. The spatial structure of the data can introduce spatial dependence into both the outcome, the predictors and the model residuals. This can be observed as neighboring observations, both with high (or low) values (positive autocorrelation) for either the dependent variable, the model predictors or the model residuals. We can also observe situations where areas with high values can be surrounded by areas with low values (negative autocorrelation). Since the standard OLS model assumes the residuals (and the outcomes themselves) are uncorrelated, as previous stated, the autocorrelation inherent to most spatial data introduces factors that violate the iid distributional assumptions for the residuals, and could violate the assumption of common variance for the OLS residuals. To account for the expected spatial association in the data, we would like a model that accounts for the spatial structure of the data. One such way of doing this is by allowing there to be correlation between residuals in our model, or to be correlation in the dependent variable. yi xi
  • 9. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 9/43 We are familiar with the concept of autoregression amongst neighboring observations. This concept is that a particular observation is a linear combination of its neighboring values. This autoregression introduces dependence into the data. Instead of specifying the autoregression structure directly, we introduce spatial autocorrelation through a global autocorrelation coefficient and a spatial proximity measure. There are 2 basic forms of the spatial autoregressive model: the spatial lag and the spatial error models. Both of these models build on the basic OLS regression model: $ Y = dots X ’ + e$ Where Y is the dependent variable, X is the matrix of independent variables, is the vector of regression parameters to be estimated from the data, and e are the model residuals, which are assumed to be distributed as a Gaussian random variable with mean 0 and constant variance-covariance matrix . The spatial lag model The spatial lag model introduces autocorrelation into the regression model by lagging the dependent variables themselves, much like in a time-series approach . The model is specified as: where is the autoregressive coefficient, which tells us how strong the resemblance is, on average, between and it’s neighbors. The matrix ** W** is the spatial weight matrix, describing the spatial network structure of the observations, like we described in the ESDA lecture. In the lag model, we are specifying the spatial component on the dependent variable. This leads to a spatial filtering of the variable, where they are averaged over the surrounding neighborhood defined in W, called the spatially lagged variable. The specification that is used most often is a spatially filtered Y variable that can then be regressed on X, which can directly be seen in a re-expression of the OLS model as: where the direct effect of the spatial lagging of the dependent variable is seen. To estimate these models we can use either GeoDa or R in R we use the spdep package, and the lagsarlm() function The lag model is: fit.lag<-lagsarlm(I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + log(MEDHHIN C), data=dat, listw = salw) summary(fit.lag, Nagelkerke=T) β Σ Y = ρWY + β + eX ′ ρ Yi Y = ρWY + β + eX ′
  • 10. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 10/43 ## ## Call: ## lagsarlm(formula = I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + ## log(MEDHHINC), data = dat, listw = salw) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.0635446 -0.0147641 -0.0036721 0.0090372 0.3252902 ## ## Type: lag ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.3136449 0.0924307 3.3933 0.0006906 ## pfemhh 0.1913535 0.0336049 5.6942 1.239e-08 ## hwy 0.0075802 0.0056013 1.3533 0.1759604 ## p5yrinmig 0.0794330 0.0202592 3.9208 8.824e-05 ## log(MEDHHINC) -0.0337148 0.0082612 -4.0811 4.482e-05 ## ## Rho: 0.034756, LR test value: 0.18517, p-value: 0.66697 ## Asymptotic standard error: 0.082235 ## z-value: 0.42264, p-value: 0.67256 ## Wald statistic: 0.17862, p-value: 0.67256 ## ## Log likelihood: 441.5604 for lag model ## ML residual variance (sigma squared): 0.0013657, (sigma: 0.036955) ## Nagelkerke pseudo-R-squared: 0.36486 ## Number of observations: 235 ## Number of parameters estimated: 7 ## AIC: -869.12, (AIC for lm: -870.94) ## LM test for residual autocorrelation ## test value: 0.4691, p-value: 0.4934 We see that is estimated to be .034, and the likelihood ratio test shows that this is not significantly different from 0. The spatial error model The spatial error model says that the autocorrelation is not in the outcome itself, but instead, any autocorrelation is attributable to there being missing spatial covariates in the data. If these spatially patterned covariates could be measures, the tne autocorrelation would be 0. This model is written: This model, in effect, controls for the nuisance of correlated errors in the data that are attributable to an inherently spatial process, or to spatial autocorrelation in the measurement errors of the measured and possibly unmeasured variables in the model. This model is estimated in R using errorsarlm() in the spdep library ρ Y = β + eX ′ e = λWe + v
  • 11. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 11/43 fit.err<-errorsarlm(I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + log(MEDHHIN C), data=dat, listw = salw) summary(fit.err, Nagelkerke=T) ## ## Call: ## errorsarlm(formula = I(viol3yr/acs_poptot) ~ pfemhh + hwy + p5yrinmig + ## log(MEDHHINC), data = dat, listw = salw) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.0650577 -0.0141501 -0.0034659 0.0092839 0.3241926 ## ## Type: error ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.3138785 0.0922430 3.4027 0.0006671 ## pfemhh 0.1971225 0.0340765 5.7847 7.264e-09 ## hwy 0.0073140 0.0057043 1.2822 0.1997762 ## p5yrinmig 0.0781620 0.0205345 3.8064 0.0001410 ## log(MEDHHINC) -0.0337316 0.0082489 -4.0893 4.328e-05 ## ## Lambda: 0.070725, LR test value: 0.53188, p-value: 0.46582 ## Asymptotic standard error: 0.094212 ## z-value: 0.7507, p-value: 0.45283 ## Wald statistic: 0.56355, p-value: 0.45283 ## ## Log likelihood: 441.7338 for error model ## ML residual variance (sigma squared): 0.0013625, (sigma: 0.036912) ## Nagelkerke pseudo-R-squared: 0.3658 ## Number of observations: 235 ## Number of parameters estimated: 7 ## AIC: -869.47, (AIC for lm: -870.94) We see = .071, with a p-value of .465, suggesting again that, in this case, there is no autocorrelation in the model residuals. We can examine the relative fits of each model by extracting the AIC values from each: AIC(fit.lag) ## [1] -869.1208 AIC(fit.err) ## [1] -869.4675 λ
  • 12. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 12/43 AIC(fit2) ## [1] -870.9356 Which, while slightly lower than the OLS model, show little evidence of favoring the spatial regression models in this case. Examination of Model Specification To some degree, both of the SAR specifications allow us to model spatial dependence in the data. The primary difference between them is where we model said dependence. The lag model says that the dependence affects the dependent variable only, we can liken this to a diffusion scenario, where your neighbors have a diffusive effect on you. The error model says that dependence affects the residuals only. We can liken this to the missing spatially dependent covariate situation, where, if only we could measure another really important spatially associated predictor, we could account for the spatial dependence. But alas, we cannot, and we instead model dependence in our errors. These are inherently two completely different ways to think about specifying a model, and we should really make our decision based upon how we think our process of interest operates. That being said, this way of thinking isn’t necessarily popular among practitioners. Most practitioners want the best fitting model, ‘nuff said. So methods have been developed that test for alternate model specifications, to see which kind of model best summarizes the observed variation in the dependent variable and the spatial dependence. These are a set of so-called Lagrange Multiplier (econometrician’s jargon for a score test (https://en.wikipedia.org/wiki/Score_test)) test. These tests compare the model fits from the OLS, spatial error, and spatial lag models using the method of the score test. For those who don’t remember, the score test is a test based on the relative change in the first derivative of the likelihood function around the maximum likelihood. The particular thing here that is affecting the value of this derivative is the autoregressive parameter, or . In the OLS model or = 0 (so both the lag and error models simplify to OLS), but as this parameter changes, so does the likelihood for the model, hence why the derivative of the likelihood function is used. This is all related to how the estimation routines estimate the value of or . Using the Lagrange Multiplier Test (LMT) In general, you fit the OLS model to your dependent variable, then submit the OLS model fit to the LMT testing procedure. Then you look to see which model (spatial error, or spatial lag) has the highest value for the test. Enter the uncertainty… So how much bigger, you might say? ρ λ ρ λ ρ λ
  • 13. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 13/43 Well, drastically bigger, if the LMT for the error model is 2500 and the LMT for the lag model is 2480, this is NOT A BIG DIFFERENCE, only about 1%. If you see a LMT for the error model of 2500 and a LMT for the lag model of 250, THIS IS A BIG DIFFERENCE. So what if you don’t see a BIG DIFFERENCE, HOW DO YOU DECIDE WHICH MODEL TO USE??? Well, you could think more, but who has time for that. The econometricians have thought up a “better” LMT test, the so-called robust LMT, robust to what I’m not sure, but it is said that it can settle such problems of a “not so big difference” between the lag and error model specifications. So what do you do? In general, think about your problem before you run your analysis, should this fail you, proceed with using the LMT, if this is inconclusive, look at the robust LMT, and choose the model which has the larger value for this test. More Data Examples: San Antonio, TX mortality data
  • 14. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 14/43 #Spatial Regression Models 1 setwd("~/Google Drive//dem7263/data") #, proj4string=CRS("+proj=utm zone=14") dat<-readShapePoly("SA_classdata.shp") dat<-dat[which(dat$acs_poptot>100),] #Create a good representative set of neighbor types sa.nb6<-knearneigh(coordinates(dat), k=6) sa.nb6<-knn2nb(sa.nb6) sa.wt6<-nb2listw(sa.nb6, style="W") sa.nb5<-knearneigh(coordinates(dat), k=5) sa.nb5<-knn2nb(sa.nb5) sa.wt5<-nb2listw(sa.nb5, style="W") sa.nb4<-knearneigh(coordinates(dat), k=4) sa.nb4<-knn2nb(sa.nb4) sa.wt4<-nb2listw(sa.nb4, style="W") sa.nb3<-knearneigh(coordinates(dat), k=3) sa.nb3<-knn2nb(sa.nb3) sa.wt3<-nb2listw(sa.nb3,style="W") sa.nb2<-knearneigh(coordinates(dat), k=2) sa.nb2<-knn2nb(sa.nb2) sa.wt2<-nb2listw(sa.nb2,style="W") sa.nbr<-poly2nb(dat, queen=F) sa.wtr<-nb2listw(sa.nbr, zero.policy=T) sa.nbq<-poly2nb(dat, queen=T) sa.wtq<-nb2listw(sa.nbr, style="W", zero.policy=T) sa.nbd<-dnearneigh(coordinates(dat), d1=0, d2=10000) sa.wtd<-nb2listw(sa.nbd, zero.policy=T) #create a mortality rate, 3 year average dat$mort3<-apply(dat@data[, c("deaths09", "deaths10", "deaths11")],1,mean) dat$mortrate<-1000*dat$mort3/dat$acs_poptot #just a hist(dat$mortrate)
  • 15. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 15/43 #do some basic regression models, without spatial structure fit.1<-lm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_poptot))+scale(dis sim)+scale(ppop65plus), data=dat) summary(fit.1)
  • 16. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 16/43 ## ## Call: ## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot)) + ## scale(dissim) + scale(ppop65plus), data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.96044 -0.27804 -0.00673 0.26359 2.18006 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.723e-16 3.120e-02 0.000 1.0000 ## scale(ppersonspo) 1.215e-01 5.047e-02 2.407 0.0169 * ## scale(I(viol3yr/acs_poptot)) 2.287e-01 3.841e-02 5.953 9.8e-09 *** ## scale(dissim) 8.467e-02 4.817e-02 1.758 0.0801 . ## scale(ppop65plus) 7.240e-01 3.594e-02 20.146 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4773 on 229 degrees of freedom ## Multiple R-squared: 0.7761, Adjusted R-squared: 0.7722 ## F-statistic: 198.4 on 4 and 229 DF, p-value: < 2.2e-16 vif(fit.1) ## scale(ppersonspo) scale(I(viol3yr/acs_poptot)) ## 2.605367 1.508595 ## scale(dissim) scale(ppop65plus) ## 2.372665 1.320878 par(mfrow=c(2,2)) plot(fit.1)
  • 17. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 17/43 par(mfrow=c(1,1)) #this is a test for constant variance bptest(fit.1) #looks like have heteroskedasticity ## ## studentized Breusch-Pagan test ## ## data: fit.1 ## BP = 51.088, df = 4, p-value = 2.14e-10 #extract studentized residuals from the fit, and examine them dat$residfit1<-rstudent(fit.1) cols<-brewer.pal(5,"RdBu") spplot(dat,"residfit1", at=quantile(dat$residfit1), col.regions=cols, main="Residu als from Model fit of Mortality Rate")
  • 18. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 18/43 Chi and Zhu (http://link.springer.com/article/10.1007/s11113-007-9051-8#page-1) suggest using a wide array of neighbor specifications, then picking the one that maximizes the autocorrelation coefficient. So, here I emulate their results: #test for residual autocorrelation resi<-c(lm.morantest(fit.1, listw=sa.wt2)$estimate[1], lm.morantest(fit.1, listw=sa.wt3)$estimate[1], lm.morantest(fit.1, listw=sa.wt4)$estimate[1], lm.morantest(fit.1, listw=sa.wt5)$estimate[1], lm.morantest(fit.1, listw=sa.wt6)$estimate[1], lm.morantest(fit.1, listw=sa.wtd, zero.policy=T)$estimate[1], lm.morantest(fit.1, listw=sa.wtq)$estimate[1], lm.morantest(fit.1, listw=sa.wtr)$estimate[1]) plot(resi, type="l")
  • 19. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 19/43 lm.morantest(fit.1, listw=sa.wt2) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wt2 ## ## Moran I statistic standard deviate = -1.3282, p-value = 0.908 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## -0.089262137 -0.010642737 0.003503515 lm.morantest(fit.1, listw=sa.wt3)
  • 20. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 20/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wt3 ## ## Moran I statistic standard deviate = 0.10133, p-value = 0.4596 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## -0.005775111 -0.010724844 0.002386190 lm.morantest(fit.1, listw=sa.wt4) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wt4 ## ## Moran I statistic standard deviate = 0.90538, p-value = 0.1826 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.02812192 -0.01050301 0.00182003 lm.morantest(fit.1, listw=sa.wt5)
  • 21. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 21/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wt5 ## ## Moran I statistic standard deviate = 1.3996, p-value = 0.08082 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.04315932 -0.01029146 0.00145856 lm.morantest(fit.1, listw=sa.wt6) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wt6 ## ## Moran I statistic standard deviate = 1.7095, p-value = 0.04368 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.049203016 -0.010078442 0.001202519 lm.morantest(fit.1, listw=sa.wtd, zero.policy=T)
  • 22. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 22/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wtd ## ## Moran I statistic standard deviate = 2.6709, p-value = 0.003782 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 1.901642e-02 -6.929932e-03 9.436939e-05 lm.morantest(fit.1, listw=sa.wtq) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wtq ## ## Moran I statistic standard deviate = 0.65308, p-value = 0.2569 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.017905792 -0.010601820 0.001905428 lm.morantest(fit.1, listw=sa.wtr)
  • 23. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 23/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus), ## data = dat) ## weights: sa.wtr ## ## Moran I statistic standard deviate = 0.65308, p-value = 0.2569 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.017905792 -0.010601820 0.001905428 #looks like we have minimal autocorrelation in our residuals, but the distance bas ed #weight does show significant autocorrelation #Let's look at the local autocorrelation in our residuals #get the values of I dat$lmfit1<-localmoran(dat$mortrate, sa.wt5, zero.policy=T)[,1] brks<-classIntervals(dat$lmfit1, n=5, style="quantile") spplot(dat, "lmfit1", at=brks$brks , col.regions=brewer.pal(5, "RdBu"), main="Local Moran Plot of Mortality Rate")
  • 24. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 24/43 #Now we fit the spatial lag model #The lag mode is fit with lagsarlm() in the spdep library #we basically specify the same model as in the lm() fit above #But we need to specify the spatial weight matrix and the type #of lag model to fit fit.lag<-lagsarlm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_poptot))+s cale(dissim)+scale(ppop65plus), data=dat, listw=sa.wt2, type="lag") summary(fit.1)
  • 25. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 25/43 ## ## Call: ## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot)) + ## scale(dissim) + scale(ppop65plus), data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.96044 -0.27804 -0.00673 0.26359 2.18006 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.723e-16 3.120e-02 0.000 1.0000 ## scale(ppersonspo) 1.215e-01 5.047e-02 2.407 0.0169 * ## scale(I(viol3yr/acs_poptot)) 2.287e-01 3.841e-02 5.953 9.8e-09 *** ## scale(dissim) 8.467e-02 4.817e-02 1.758 0.0801 . ## scale(ppop65plus) 7.240e-01 3.594e-02 20.146 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.4773 on 229 degrees of freedom ## Multiple R-squared: 0.7761, Adjusted R-squared: 0.7722 ## F-statistic: 198.4 on 4 and 229 DF, p-value: < 2.2e-16 summary(fit.lag, Nagelkerke=T)
  • 26. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 26/43 ## ## Call: ## lagsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_po ptot)) + ## scale(dissim) + scale(ppop65plus), data = dat, listw = sa.wt2, ## type = "lag") ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.91953751 -0.28320690 0.00039592 0.25188870 2.28912610 ## ## Type: lag ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.0035113 0.0307209 -0.1143 0.9090 ## scale(ppersonspo) 0.1047111 0.0509073 2.0569 0.0397 ## scale(I(viol3yr/acs_poptot)) 0.2235749 0.0379480 5.8916 3.825e-09 ## scale(dissim) 0.0758260 0.0476155 1.5925 0.1113 ## scale(ppop65plus) 0.7019724 0.0376957 18.6221 < 2.2e-16 ## ## Rho: 0.066138, LR test value: 2.1119, p-value: 0.14616 ## Asymptotic standard error: 0.043789 ## z-value: 1.5104, p-value: 0.13095 ## Wald statistic: 2.2813, p-value: 0.13095 ## ## Log likelihood: -155.394 for lag model ## ML residual variance (sigma squared): 0.22064, (sigma: 0.46973) ## Nagelkerke pseudo-R-squared: 0.77808 ## Number of observations: 234 ## Number of parameters estimated: 7 ## AIC: 324.79, (AIC for lm: 324.9) ## LM test for residual autocorrelation ## test value: 9.0458, p-value: 0.002633 bptest.sarlm(fit.lag) ## ## studentized Breusch-Pagan test ## ## data: ## BP = 53.263, df = 4, p-value = 7.506e-11 #robust SE's for the spatial model library(sandwich) lm.target <- lm(fit.lag$tary ~ fit.lag$tarX - 1) coeftest(lm.target, vcov.=vcovHC(lm.target, type="HC0"))
  • 27. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 27/43 ## ## t test of coefficients: ## ## Estimate Std. Error t value ## fit.lag$tarXx(Intercept) -0.0035113 0.0307070 -0.1143 ## fit.lag$tarXxscale(ppersonspo) 0.1047111 0.0790642 1.3244 ## fit.lag$tarXxscale(I(viol3yr/acs_poptot)) 0.2235749 0.1140534 1.9603 ## fit.lag$tarXxscale(dissim) 0.0758260 0.0498978 1.5196 ## fit.lag$tarXxscale(ppop65plus) 0.7019724 0.0617274 11.3721 ## Pr(>|t|) ## fit.lag$tarXx(Intercept) 0.90906 ## fit.lag$tarXxscale(ppersonspo) 0.18670 ## fit.lag$tarXxscale(I(viol3yr/acs_poptot)) 0.05118 . ## fit.lag$tarXxscale(dissim) 0.12998 ## fit.lag$tarXxscale(ppop65plus) < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #Next we fit the spatial error model fit.err<-errorsarlm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_popto t))+scale(dissim)+scale(ppop65plus), data=dat, listw=sa.wt2) summary(fit.err, Nagelkerke=T)
  • 28. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 28/43 ## ## Call: ## errorsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/ac s_poptot)) + ## scale(dissim) + scale(ppop65plus), data = dat, listw = sa.wt2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.9126219 -0.2695111 -0.0030349 0.2680765 2.1121586 ## ## Type: error ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.0029538 0.0277141 0.1066 0.91512 ## scale(ppersonspo) 0.1202146 0.0484444 2.4815 0.01308 ## scale(I(viol3yr/acs_poptot)) 0.2303560 0.0371271 6.2045 5.486e-10 ## scale(dissim) 0.0870377 0.0458697 1.8975 0.05776 ## scale(ppop65plus) 0.7350826 0.0340110 21.6131 < 2.2e-16 ## ## Lambda: -0.10644, LR test value: 2.2391, p-value: 0.13456 ## Asymptotic standard error: 0.071547 ## z-value: -1.4877, p-value: 0.13684 ## Wald statistic: 2.2132, p-value: 0.13684 ## ## Log likelihood: -155.3304 for error model ## ML residual variance (sigma squared): 0.22001, (sigma: 0.46905) ## Nagelkerke pseudo-R-squared: 0.7782 ## Number of observations: 234 ## Number of parameters estimated: 7 ## AIC: 324.66, (AIC for lm: 324.9) #As a pretty good indicator of which model is best, look at the AIC of each AIC(fit.1) ## [1] 324.8999 AIC(fit.lag) ## [1] 324.788 AIC(fit.err) ## [1] 324.6608 Larger data example US counties
  • 29. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 29/43 This example shows a lot more in terms of spatial effects. spdat<-readShapePoly("~/Google Drive/dem7263/data/usdata_mort.shp") #Create a good representative set of neighbor types us.nb6<-knearneigh(coordinates(spdat), k=6) us.nb6<-knn2nb(us.nb6) us.wt6<-nb2listw(us.nb6, style="W") us.nb5<-knearneigh(coordinates(spdat), k=5) us.nb5<-knn2nb(us.nb5) us.wt5<-nb2listw(us.nb5, style="W") us.nb4<-knearneigh(coordinates(spdat), k=4) us.nb4<-knn2nb(us.nb4) us.wt4<-nb2listw(us.nb4, style="W") us.nb3<-knearneigh(coordinates(spdat), k=3) us.nb3<-knn2nb(us.nb3) us.wt3<-nb2listw(us.nb3,style="W") us.nb2<-knearneigh(coordinates(spdat), k=2) us.nb2<-knn2nb(us.nb2) us.wt2<-nb2listw(us.nb2,style="W") us.nbr<-poly2nb(spdat, queen=F) us.wtr<-nb2listw(us.nbr, zero.policy=T) us.nbq<-poly2nb(spdat, queen=T) us.wtq<-nb2listw(us.nbr, style="W", zero.policy=T) hist(spdat$mortrate)
  • 30. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 30/43 spplot(spdat,"mortrate", at=quantile(spdat$mortrate), col.regions=brewer.pal(n=5, "Reds"), main="Spatial Distribution of US Mortality Rate")
  • 31. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 31/43 #do some basic regression models, without spatial structure fit.1.us<-lm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scal e(phisp)+factor(RUCC), spdat) summary(fit.1.us)
  • 32. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 32/43 ## ## Call: ## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + factor(RUCC), data = spdat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.5810 -0.4280 0.0216 0.4534 4.2606 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.16408 0.06218 -2.639 0.00836 ** ## scale(ppersonspo) 0.60632 0.01789 33.893 < 2e-16 *** ## scale(p65plus) -0.04085 0.01582 -2.582 0.00987 ** ## scale(pblack_1) 0.10730 0.01641 6.538 7.29e-11 *** ## scale(phisp) -0.28734 0.01484 -19.358 < 2e-16 *** ## factor(RUCC)1 0.41534 0.08731 4.757 2.05e-06 *** ## factor(RUCC)2 0.29579 0.07226 4.094 4.36e-05 *** ## factor(RUCC)3 0.11800 0.07985 1.478 0.13955 ## factor(RUCC)4 0.23900 0.08845 2.702 0.00693 ** ## factor(RUCC)5 0.13588 0.09502 1.430 0.15282 ## factor(RUCC)6 0.41615 0.06901 6.030 1.83e-09 *** ## factor(RUCC)7 0.17107 0.07097 2.411 0.01599 * ## factor(RUCC)8 0.11620 0.07880 1.475 0.14040 ## factor(RUCC)9 -0.20337 0.07654 -2.657 0.00793 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.7412 on 3053 degrees of freedom ## Multiple R-squared: 0.453, Adjusted R-squared: 0.4507 ## F-statistic: 194.5 on 13 and 3053 DF, p-value: < 2.2e-16 vif(fit.1.us) ## GVIF Df GVIF^(1/(2*Df)) ## scale(ppersonspo) 1.786082 1 1.336444 ## scale(p65plus) 1.397321 1 1.182083 ## scale(pblack_1) 1.503301 1 1.226092 ## scale(phisp) 1.229648 1 1.108895 ## factor(RUCC) 1.724759 9 1.030746 par(mfrow=c(2,2)) plot(fit.1.us)
  • 33. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 33/43 par(mfrow=c(1,1)) #this is a test for constant variance bptest(fit.1.us) #looks like have heteroskedasticity ## ## studentized Breusch-Pagan test ## ## data: fit.1.us ## BP = 249.87, df = 13, p-value < 2.2e-16
  • 34. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 34/43 #extract studentized residuals from the fit, and examine them spdat$residfit1<-rstudent(fit.1.us) cols<-brewer.pal(5,"RdBu") spplot(spdat,"residfit1", at=quantile(spdat$residfit1), col.regions=cols, main="Re siduals from Model fit of US Mortality Rate")
  • 35. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 35/43 #test for residual autocorrelation resi<-c(lm.morantest(fit.1.us, listw=us.wt2)$estimate[1], lm.morantest(fit.1.us, listw=us.wt3)$estimate[1], lm.morantest(fit.1.us, listw=us.wt4)$estimate[1], lm.morantest(fit.1.us, listw=us.wt5)$estimate[1], lm.morantest(fit.1.us, listw=us.wt6)$estimate[1], lm.morantest(fit.1.us, listw=us.wtq,zero.policy=T)$estimate[1], lm.morantest(fit.1.us, listw=us.wtr,zero.policy=T)$estimate[1]) plot(resi, type="l") lm.morantest(fit.1.us, listw=us.wt2)
  • 36. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 36/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC), ## data = spdat) ## weights: us.wt2 ## ## Moran I statistic standard deviate = 23.756, p-value < 2.2e-16 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.3965422192 -0.0018662504 0.0002812589 lm.morantest(fit.1.us, listw=us.wt3) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC), ## data = spdat) ## weights: us.wt3 ## ## Moran I statistic standard deviate = 27.862, p-value < 2.2e-16 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.3879364703 -0.0018143292 0.0001956793 lm.morantest(fit.1.us, listw=us.wt4)
  • 37. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 37/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC), ## data = spdat) ## weights: us.wt4 ## ## Moran I statistic standard deviate = 31.107, p-value < 2.2e-16 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.3798405687 -0.0017734270 0.0001504971 lm.morantest(fit.1.us, listw=us.wt5) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC), ## data = spdat) ## weights: us.wt5 ## ## Moran I statistic standard deviate = 33.076, p-value < 2.2e-16 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.3625191247 -0.0017199894 0.0001212718 lm.morantest(fit.1.us, listw=us.wt6)
  • 38. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 38/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC), ## data = spdat) ## weights: us.wt6 ## ## Moran I statistic standard deviate = 35.916, p-value < 2.2e-16 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.3600763582 -0.0016700396 0.0001014426 lm.morantest(fit.1.us, listw=us.wtq, zero.policy=T) ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC), ## data = spdat) ## weights: us.wtq ## ## Moran I statistic standard deviate = 32.728, p-value < 2.2e-16 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.3693002497 -0.0016900532 0.0001284917 lm.morantest(fit.1.us, listw=us.wtr, zero.policy=T)
  • 39. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 39/43 ## ## Global Moran's I for regression residuals ## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) + ## scale(p65plus) + scale(pblack_1) + scale(phisp) + factor(RUCC), ## data = spdat) ## weights: us.wtr ## ## Moran I statistic standard deviate = 32.728, p-value < 2.2e-16 ## alternative hypothesis: greater ## sample estimates: ## Observed Moran's I Expectation Variance ## 0.3693002497 -0.0016900532 0.0001284917 #Now we fit the spatial lag model #The lag mode is fit with lagsarlm() in the spdep library #we basically specify the same model as in the lm() fit above #But we need to specify the spatial weight matrix and the type #of lag model to fit fit.lag.us<-lagsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblac k_1)+scale(phisp)+factor(RUCC), spdat, listw=us.wt2, type="lag") summary(fit.lag.us, Nagelkerke=T)
  • 40. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 40/43 ## ## Call: ## lagsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + factor(RUCC), data = spdat, ## listw = us.wt2, type = "lag") ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.435066 -0.365396 0.016386 0.379149 4.181996 ## ## Type: lag ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.0601089 0.0539296 -1.1146 0.2650300 ## scale(ppersonspo) 0.4383725 0.0166361 26.3507 < 2.2e-16 ## scale(p65plus) -0.0067939 0.0137105 -0.4955 0.6202305 ## scale(pblack_1) 0.0555812 0.0144317 3.8513 0.0001175 ## scale(phisp) -0.1891732 0.0132557 -14.2711 < 2.2e-16 ## factor(RUCC)1 0.2866229 0.0756754 3.7875 0.0001522 ## factor(RUCC)2 0.1402669 0.0627174 2.2365 0.0253196 ## factor(RUCC)3 0.0146097 0.0692250 0.2110 0.8328515 ## factor(RUCC)4 0.1496221 0.0766327 1.9525 0.0508838 ## factor(RUCC)5 0.0759998 0.0823307 0.9231 0.3559531 ## factor(RUCC)6 0.2510359 0.0599005 4.1909 2.779e-05 ## factor(RUCC)7 0.0760537 0.0615150 1.2363 0.2163308 ## factor(RUCC)8 0.0063996 0.0682837 0.0937 0.9253308 ## factor(RUCC)9 -0.2506134 0.0662793 -3.7812 0.0001561 ## ## Rho: 0.39892, LR test value: 679.88, p-value: < 2.22e-16 ## Asymptotic standard error: 0.014734 ## z-value: 27.075, p-value: < 2.22e-16 ## Wald statistic: 733.08, p-value: < 2.22e-16 ## ## Log likelihood: -3086.33 for lag model ## ML residual variance (sigma squared): 0.4118, (sigma: 0.64172) ## Nagelkerke pseudo-R-squared: 0.56174 ## Number of observations: 3067 ## Number of parameters estimated: 16 ## AIC: 6204.7, (AIC for lm: 6882.5) ## LM test for residual autocorrelation ## test value: 33.636, p-value: 6.6459e-09 bptest.sarlm(fit.lag.us)
  • 41. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 41/43 ## ## studentized Breusch-Pagan test ## ## data: ## BP = 238.13, df = 13, p-value < 2.2e-16 #Robust SE's lm.target.us <- lm(fit.lag.us$tary ~ fit.lag.us$tarX - 1) coeftest(lm.target.us, vcov.=vcovHC(lm.target.us, type="HC0")) ## ## t test of coefficients: ## ## Estimate Std. Error t value Pr(>|t|) ## fit.lag.us$tarXx(Intercept) -0.0601089 0.0477044 -1.2600 0.2077555 ## fit.lag.us$tarXxscale(ppersonspo) 0.4383725 0.0250784 17.4801 < 2.2e-16 ## fit.lag.us$tarXxscale(p65plus) -0.0067939 0.0176425 -0.3851 0.7002021 ## fit.lag.us$tarXxscale(pblack_1) 0.0555812 0.0186951 2.9730 0.0029718 ## fit.lag.us$tarXxscale(phisp) -0.1891732 0.0168362 -11.2361 < 2.2e-16 ## fit.lag.us$tarXxfactor(RUCC)1 0.2866229 0.0597309 4.7986 1.675e-06 ## fit.lag.us$tarXxfactor(RUCC)2 0.1402669 0.0501894 2.7948 0.0052264 ## fit.lag.us$tarXxfactor(RUCC)3 0.0146097 0.0593415 0.2462 0.8055470 ## fit.lag.us$tarXxfactor(RUCC)4 0.1496221 0.0651111 2.2980 0.0216319 ## fit.lag.us$tarXxfactor(RUCC)5 0.0759998 0.0803115 0.9463 0.3440640 ## fit.lag.us$tarXxfactor(RUCC)6 0.2510359 0.0526992 4.7636 1.991e-06 ## fit.lag.us$tarXxfactor(RUCC)7 0.0760537 0.0576485 1.3193 0.1871791 ## fit.lag.us$tarXxfactor(RUCC)8 0.0063996 0.0699315 0.0915 0.9270913 ## fit.lag.us$tarXxfactor(RUCC)9 -0.2506134 0.0664019 -3.7742 0.0001636 ## ## fit.lag.us$tarXx(Intercept) ## fit.lag.us$tarXxscale(ppersonspo) *** ## fit.lag.us$tarXxscale(p65plus) ## fit.lag.us$tarXxscale(pblack_1) ** ## fit.lag.us$tarXxscale(phisp) *** ## fit.lag.us$tarXxfactor(RUCC)1 *** ## fit.lag.us$tarXxfactor(RUCC)2 ** ## fit.lag.us$tarXxfactor(RUCC)3 ## fit.lag.us$tarXxfactor(RUCC)4 * ## fit.lag.us$tarXxfactor(RUCC)5 ## fit.lag.us$tarXxfactor(RUCC)6 *** ## fit.lag.us$tarXxfactor(RUCC)7 ## fit.lag.us$tarXxfactor(RUCC)8 ## fit.lag.us$tarXxfactor(RUCC)9 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • 42. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 42/43 #Next we fit the spatial error model fit.err.us<-errorsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pbla ck_1)+scale(phisp)+factor(RUCC), spdat, listw=us.wt2) summary(fit.err.us, Nagelkerke=T) ## ## Call: ## errorsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + factor(RUCC), data = spdat, ## listw = us.wt2) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.246243 -0.363446 0.018518 0.378347 4.269437 ## ## Type: error ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.1519258 0.0718058 -2.1158 0.0343630 ## scale(ppersonspo) 0.4956221 0.0192362 25.7651 < 2.2e-16 ## scale(p65plus) -0.0050401 0.0163231 -0.3088 0.7574975 ## scale(pblack_1) 0.1454472 0.0200302 7.2614 3.830e-13 ## scale(phisp) -0.2201678 0.0193200 -11.3959 < 2.2e-16 ## factor(RUCC)1 0.2983426 0.0802941 3.7156 0.0002027 ## factor(RUCC)2 0.1978474 0.0802418 2.4656 0.0136768 ## factor(RUCC)3 0.1136176 0.0837459 1.3567 0.1748783 ## factor(RUCC)4 0.2271767 0.0877477 2.5890 0.0096261 ## factor(RUCC)5 0.2166439 0.0959520 2.2578 0.0239559 ## factor(RUCC)6 0.3282910 0.0763724 4.2986 1.719e-05 ## factor(RUCC)7 0.1870157 0.0796173 2.3489 0.0188272 ## factor(RUCC)8 0.0798636 0.0828415 0.9641 0.3350191 ## factor(RUCC)9 -0.1286435 0.0843255 -1.5256 0.1271197 ## ## Lambda: 0.43168, LR test value: 582.46, p-value: < 2.22e-16 ## Asymptotic standard error: 0.016177 ## z-value: 26.685, p-value: < 2.22e-16 ## Wald statistic: 712.11, p-value: < 2.22e-16 ## ## Log likelihood: -3135.038 for error model ## ML residual variance (sigma squared): 0.42027, (sigma: 0.64829) ## Nagelkerke pseudo-R-squared: 0.5476 ## Number of observations: 3067 ## Number of parameters estimated: 16 ## AIC: 6302.1, (AIC for lm: 6882.5) #As a pretty good indicator of which model is best, look at the AIC of each AIC(fit.1)
  • 43. 10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 1 file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_2.html 43/43 ## [1] 324.8999 AIC(fit.lag.us) ## [1] 6204.659 AIC(fit.err.us) ## [1] 6302.075