Spatial Analysis with R - the Good, the Bad, and the Pretty
1. The good, the bad & the pretty
Spatial data analysis with R
Robert Hijmans
University of California, Davis
May 2013
2. Spatial is special
• Complex: geometry and attributes
• Earth is flat? Map projections
• Size: lots and lots of it, multivariate, time series
• Special plots: maps
• First Law of Geography: nearby things are similar
– Statistical assumptions: violated
– Interpolation: possible
3. GIS* –
● Visual interaction –
• Data management –
• Geometric operations –
• Standard workflows –
• Single map production –
• Click, click, click & click –
• Speed of execution –
• Cumbersome –
Don't we have GIS for that?
– R
– Data & model focused ●
– Analysis ●
– Attributes as important ●
– Creativity & innovation ●
– Many (simpler) maps ●
– Repeatability (single script) ●
– Speed of development ●
– Easy & powerful (& free) ●
* there are many different GISs and they evolve
10. Types of spatial analysis*
• Query and reasoning
Where is? How much is this here? How to get from A to B?
• „Measurement
Area, Distance, Length, Slope
• „Transformation
Buffering, overlay, interpolation
• „Exploration and description
clusters, trends, spatial dependence, fragmentation
• „Optimization
Site selection, re-districting, traveling salesman
• „Inference
Samples from a population, problem of spatial autocorrelation
• Modeling
Climate change effects, impact of nuclear accident, dispersal
* After Michael Goodchild: http://www.csiss.org/aboutus/presentations/files/goodchild_qmss_oct02.pdf
12. 1. Location of points is of prime interest
2. Points are not a sample
3. Points are within a defined study area
4. Points should be true incidents (not centroids)
Point patterns
13. Point patterns
> library(spatstat); library(maptools)
> cityOwin <- as(city, “owin”)
> pts <- coordinates(crime)
> p <- ppp(pts[,1], pts[,2], window=cityOwin)
> s <- smooth.ppp(p)
> e <- envelope(p) http://www.spatstat.org/
14.
15. Geostatistics
> library(gstat)
> data(meuse)
> coordinates(meuse) <- ~x+y
> spplot(meuse, 'zinc')
1. Measurements are of prime interest (not locations)
2. Points are a sample
3. Unbiased estimates for locations that were not sampled
16. > x <- krige(log(zinc)~1, meuse, meuse.grid, model = m)
> spplot(x["var1.pred"], main="ordinary kriging predictions")
> spplot(x["var1.var"], main = "ordinary kriging variance")
17. > f <- houseValue ~ age + nBedrooms
> m <- lm(f1, data=hh)
> summary(m)
Call:
lm(formula = f1, data = hh)
Residuals:
Min 1Q Median 3Q Max
-222541 -67489 -6128 60509 217655
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -628578 233217 -2.695 0.00931 **
age 12695 2480 5.119 4.05e-06 ***
nBedrooms 191889 76756 2.500 0.01543 *
Regression with spatial data
19. > library(spdep)
> cb <- poly2nb(ca)
> lw <- nb2listw(cb)
> plot(ca)
> plot(lw, coordinates(ca),
add=TRUE, col="red")
> moran.test(residuals, lw)
Moran's I test under randomisation
Moran I statistic standard deviate = 2.6926, p-value = 0.003545
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.158977893 -0.010101010 0.003943149
20. If SA ‘significant’ then you could
• Re-specify your model
• Permit the coefficients, , to vary spatially
(GWR)
• Modify the regression model to incorporate the SA
• Proceed and ignore SA?
21. OLS: Y = Xβ + e
Autogregressive model: Y = ρWY + e
Simultaneous Autoregressive Models:
SAR-lag: Y = ρWY + Xβ + e
(endogenous, inherent spat. autocorrelation, diffusion )
SAR-err: Y = Xβ + λWu + e
(exogenous, induced spatial autocorrelation)
SAR-mix: Y = ρWY + Xβ + WXγ + e
CAR
22. raster package
• new classes (‘S4’) for raster data
• no file size restrictions
• file formats: gdal, ncdf, ‘native’
• > 200 functions
23. RasterLayer
> library(raster)
>
> x <- raster(ncol=10, nrow=5)
>
> x <- raster('volcano.tif')
>
> x
class : RasterLayer
dimensions : 87, 61, 5307 (nrow, ncol, ncell)
resolution : 10, 10 (x, y)
extent : 2667400, 2668010, 6478700, 6479570 (xmin, xmax, …
coord. ref. : +proj=nzmg +lat_0=-41 +lon_0=173 +x_0=251
values : d:datavolcano.tif
min value : 94
max value : 195
24. > str(x)
Formal class 'RasterLayer' [package "raster"] with 16 slots
..@ file :Formal class '.RasterFile' [package "raster"] with 9 slots
. . .. ..@ name : chr “d:datavolcano.tif“
.. .. ..@ driver : chr "gdal"
..@ data :Formal class '.SingleLayerData' [package "raster"] with 11 slots
.. .. ..@ values : logi(0)
.. .. ..@ inmemory : logi FALSE
.. .. ..@ min : num 94
. . .. ..@ max : num 195
..@ extent :Formal class 'Extent' [package "raster"] with 4 slots
.. .. ..@ xmin: num 2667400
.. .. ..@ xmax: num 2668010
.. @ rotation :Formal class '.Rotation' [package "raster"] with 2 slots
.. .. ..@ geotrans: num(0)
.. .. ..@ transfun:function ()
..@ ncols : int 61
..@ nrows : int 87
..@ crs :Formal class 'CRS' [package "sp"] with 1 slots
.. .. ..@ projargs: chr " +proj=nzmg +lat_0=-41 +lon_0=173 +x_0=2510000 +y_0=6023150
..@ layernames: chr "volcano”
RasterLayer
25. Multiple layers
RasterStack - many files
RasterBrick - single files
> s <- stack(x, x*2, sqrt(x))
>
> s
class : RasterStack
dimensions : 87, 61, 5307, 3 (nrow, ncol, ncell,
nlayers)
resolution : 0.01639344, 0.01149425 (x, y)
extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)
coord. ref. : NA
min values : 94.0, 188.0, 9.7
max values : 195, 390, 14
layer names : layer.1, layer.2, layer.3