CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models and Analysis - Douglas Nychka, Oct 10, 2017
1. Spatial data:
models and analysis
Lecture 1.3
Douglas Nychka,
National Center for Atmospheric Research
Supported by the National Science Foundation May, 2017
2. Outline
2
• Additive statistical model
• Spatial Processes and covariance functions
• The multivariate normal distribution
• Kriging
• Kriging uncertainty
• Likelihood for spatial data
• f i e l d s
6. Goals:
6
1. Prediction: Determine the climate at locations where there are
not stations.
2. Uncertainty: Quantify the error in the predictions.
3. Summarize the spatial/temp oral structure: One to ol for
comparing observations to models and models to models for physical
insight
8. Covariance functions g(x) is a random surface with E[g(x)] = µ(x)
8
Covariance function k has two spatial arguments
k(x1, x2 ) = COV (g(x1), g(x2))
• Usually µ(x) is constant or even zero.
• Often k only depends on distance of separation between locations.
Exponential
k(x1, x2 ) = ρe− distance(x1,x2)/θ
Isotropic
k(x1, x2 ) = ρΦ(distance(x1, x2 )/θ)
9. Families of correlation functions
Matern:
φ(d) = ρψν (d/θ)) with ψν a Bessel function.
ν = .5 Exponential, 1.0 , 2.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
distance
correlation
•θ a range parameter
•ν smoothness at 0.
•ψν is an exponential for ν =
1/2 as ν → ∞ Gaussian.
• As ν increases the process
9
is smoother.
Wendland:
Polynomial that is exactly zero outside given range.
Compactly supported Wendland covariance ( d = 2 , k = 3 )
10. W h a t do these processes look like?
Varying the smoothness:
Matern (.5) Matern(1.0)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
Matern (2.0) Wendland (2.0)
10
11. W h a t do these processes look like?
11
Varying the smoothness:
Matern (.5) Matern(1.0)
Exponential Matern1.0
Matern2.0 Wendland
−4
−2
0
2
4
Matern (2.0) Wendland (2.0)
13. Observations:
13
iy =
k
i,k k iZ d + g(x ) + Ei
or by vectors
y = Z d + g + E
Observation = fixed component + spatial process + error
• Ei’s are uncorrelated N (0, σ2)
Spatial process:
• g(x) mean zero Gaussian process V AR(g(x)) = ρ
• Covariance function ρkθ(., .) with θ some parameters.
14. A multivariate normal world
14
Covariance matrix of g at observations:
Kθ[i, j ] = kθ(xi, x j )
so
E(g(xi )g(xj )) = ρKθ[i, j ]
The observation model can be collapsed as multivariate nor-
mal
y ∼ M N (Zd, ρKθ + σ2 I)
g(x) given y and all parameters is multivariate Gaussian
Follows the standard form for the conditional distribution of a multivari-
ate Gaussian.
15. Some Background
15
Joint covariance matrix of (U,V) is Σ
Σ =
Σ u,u Σ u,v
Σ v,u Σ u,v
(1)
Recall if U and V are multivariate normal and E(U ) = E(V ) = 0.
u,u[V |U ] ∼ M N ( Σ v,u( Σ −1) U , Σ v,v v,u− Σ ( Σ −1
u,u u,v) Σ )
17. W h a t is Kriging?
17
•Assume that the covariance parameters are known or estimated by
variogram fitting or maximum likelihood.
• Find the parameters in the linear model by generalized least squares.
•With covariance and linear model parameters fixed, find the condi-
tional distribution for g given y
ˆg is the conditional expectation of g given y
e. g. ˆg(x) = E[g(x)|y]
Prediction standard errors for g based on conditional variances.
18. T h e Kriging backbone
18
We will use this result in many places
Suppose
• there is no fixed effect ( mean zero)
• prediction at the observations.
yi = g(xi) + ei
or conveniently
y = g + e
19. Recall
19
u,uΣ v,u( Σ −1) U , Σ v,v v,u− Σ ( Σ −1
u,u) Σ u,v )[V |U] ∼ M N (
V = g U = y
Σv,v = cov(g, g) = ρK Note: Ki , j = k(xi, x j )
Σu,v = cov(g, y ) = cov(g, g + e) = cov(g, g) = ρK
Σu,u = cov(y, y ) = ρK + σ2I
Conditional mean:
ˆg = ρK(ρK + σ2I)−1y = K ( K + (σ2/ρ)I)−1y
ˆg = K ( K + λI)− 1 y = ( I + λQ)−1y
This is a smoother! ( Q = K− 1 )
20. Uncertainty of the Kriging estimate
20
[V |U ] ∼ M N ( Σ v,u( Σ −1) U , Σ v,v v,u
−1
u,u u,u u,v− Σ ( Σ ) Σ )
•Evaluate the pointwise prediction for a limited number of locations
•conditional simulation Generate a Monte Carlo sample from the Mul-
tivariate normal distribution.
21. Where are the basis functions?
21
Prediction at a location x
u,uΣv,u(Σ−1 )U
V = g(x), U = y
ˆg(x) = (ρk(x, x1), ..., ρk(x, xn))T (ρK + σ2I)−1y
ˆg(x) =
n
k(x, x )ci i c = ( K + λI)− 1 y
i = 1
ˆg(x) = sum of basis functions × coefficients
22. Lecture on smoothers revisited
22
•ˆg(x) is type of spline using a specific roughness penalty
•ˆg( x ) is a penalized least squares estimate with a specific Q and a
specific set of basis functions.
• Q−1 = K and bi(x) = k(x, xi )
What about fixed effects? ( the d s)
Estimate by generalized least squares and subtract off before doing
Kriging.
24. Likelihood for statistical parameters
y ∼ M N (Zd, (ρK + σ2 I))
Likelihood
1
√
2
24
π
n e 2
1 T 2 −1− ( y− Zd) ( ρ K + σ I ) ( y− Zd) |ρK + σ |2 −1/2
25. log Likelihood
1
2
T− ( y − Z d) ( ρK 2 −1
+ σ I ) ( y
1
2
2− Z d ) − log(det(ρK + σ I ) ) + stuff
With λ = σ2/ρ
−
1 T( y − Z d) ( K −1 1 n
2ρ 2 2
25
+ λ I ) ( y − Z d) − log(det(K + λ I ) ) + − log( ρ) + stuff
• Can maximize this analytically for ρ and d
dˆ=
(
ZT M Z
− 1
ZT M y
ρˆ= ( 1 / n ) ( y − Zdˆ)T M ( y − Zdˆ)
where M = ( K + λI)− 1
• K may depend on other parameters e.g. scale, shape
• Once we find these parameters – we do Kriging for the predictions!
30. spatialProcess
30
# x and y a r e t h e s y n t h e t i c example
l i b r a r y ( f i e l d s )
obj<- s p a t i a l P r o c e s s ( x , y )
31. In R
31
C a l l :
spatialProcess(x = x , y = y)
Number of Observations: 150
Degree of polynomial n u l l space ( base model): 1
Total number of parameters i n base model 2
E f f . degrees of freedom 17.04
Standard Error of e st i m a te: 0.9162
Smoothing parameter 0.02295
MLE sigma 0.0798
MLE rho 0.2775
MLE lambda = MLE sigma^2 / MLE rho 0.02295
MLE t h e t a 0.432
Nonzero e n t r i e s i n covariance 22500
Covariance Model: st a t io nar y.c ov
Covariance f u n c t ion : Matern
Non-default covariance arguments and t h e i r values
Argument: Covariance has the v a l u e ( s ) :
[ 1 ] "Matern"
Argument: smoothness has the v a l u e ( s ) :
has the v a l u e ( s ) :
[ 1 ] 1
Argument: t h e t a
t h e t a
0.4320242
32. In R – C o l o r a d o C l i m a t e
32
library( fields)
# colorado climate data
data(COmonthlyMet)
x<- CO.loc
y<- CO.tmin.MAM.climate
elev<- CO.elev
good<- !is.na( y)
x<- x[good,]
y<- y[good]
elev<- elev[good]
NGRID <- 50
# get elevations on a grid (will use these
later)
# lonRange<- range(x[,1])
# latRange<- range(x[,2])
# COGrid<- list(x=seq(lonRange[1],
lonRange[2],length.out=NGRID ),
# y=seq(latRange[1],
latRange[2],length.out=NGRID )
# )
COGrid<- list(x = seq( -109, -101,,NGRID),
y = seq(36.5, 41.5,,NGRID))
COGridPoints<- make.surface.grid( COGrid )
data( RMelevation)
COElevGrid<- interp.surface.grid( RMelevation,
COGrid )
# Native Res: COElevGrid<- crop.image(
RMelevation, x )
# take a look at the data
quilt.plot( x, y)
US( add=TRUE)
plot( elev, y)
plot( x[,2], y)
X<- cbind( x, elev)
#
lmObj<- lm( y ~ lon+lat +elev, data=X
)
summary( lmObj)
quilt.plot( x, lmObj$residuals)
US( add=TRUE)
fit0E<- Tps( x,y, Z=elev- mean(elev))
# fit a Kriging estimator Matern
covariance smoothness =1.0
# range and nugget estimated by
maxmimum likelihood
fit1<- spatialProcess( x,y)
set.panel(2,2)
plot( fit1)
surface( fit1)
US( add=TRUE)
33. 33
# take a look at residuals
plot( elev, fit1$residuals)
lmObj<- lm( fit1$residuals ~ elev)
abline( lmObj, col="red", lwd=3)
# with elevations
fit1E<- spatialProcess( x,y, Z= elev)
# the clean way for fields >=9.0
sur0<- predictSurface( fit1E,
grid.list=COGrid, ZGrid=COElevGrid )
# use this for fields <= 9.0
sur0<- predict( fit1E, x=COGridPoints,
Z=c(COElevGrid$z) )
sur0<- as.surface( COGrid, sur0)
#
image.plot( sur0, col=terrain.colors(256))
sur0Smooth<- predictSurface( fit1E,
grid.list=COGrid, drop.Z= TRUE)
image.plot( sur0Smooth)
# uncertainty 40 draws from posterior
distribution
set.seed(123)
# next commmand takes a minute or so
SEout<- sim.spatialProcess( fit1E,
xp=COGridPoints,
Z= c(COElevGrid$z), M= 40, drop.Z=TRUE)
set.panel( 3,3)
image.plot( sur0Smooth,
zlim=c(3.5,14.5), col=tim.colors(256))
contour( sur0Smooth, levels= 9, lwd=3,
col="grey", add=TRUE)
par( mar=c(3,3,1,1))
for( k in 1:8){
image( as.surface( COGrid,
SEout[k,]),
zlim
=c(3.5,14.5),col=tim.colors(256),
axes=FALSE)
contour( as.surface( COGrid,
SEout[k,]),
lwd=3, col="grey",level=9, add=TRUE)
title( k, adj=0, cex=2)
}
surSE<- apply( SEout, 2, sd )
image.plot( as.surface( COGrid,
surSE))
points( x, col="magenta", pch=16)
contour( COElevGrid,
level= c(2000, 3000), add= TRUE,
col="grey30", lwd=3)
US( add=TRUE, col="grey", lwd=2)
In R – C o l o r a d o C l i m a t e ( c o n t . )
34. xgrid<-seq( 0 , 1 , length.out=200)
f h a t < - p r e d i c t ( o b j , x g r i d )
p l o t ( obj$x, obj$y)
l i n e s ( x g r i d , f h a t , col="red")
●
●
●
●●●
●●
●
●
●●●
●
●●
●●
●
●
●●
● ●
●
●
●●
●
● ●
●
●
●
● ● ● ●
●
●● ●
● ●
●
●
●
●●●
● ● ● ●●●●
● ●
● ●
●● ●●
●
●
●● ●
●
● ●●
●
●
● ●●
●
●
● ●
● ●
●
●
●●
● ●
●
●
● ●●
●
●
●●●●● ●
●●
●
● ● ●
●
●
●
● ●●● ● ●●
●
●●
● ●
●
●
●●
● ●
●
● ●
●
● ●●●
● ●
● ● ●●
●● ●
●
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.00.20.40.60.81.0
obj$x
obj$y
34
36. V ariogram as an E D A to ol
How to estimate the spatial from a single field?
• Plot for a spatial data set or spatial field plot
( y i −y j ) 2
2
36
against the
distance of separation. ” O n the average” this should be follow the
theoretical curve that is the variance of the data minus the covariance
function .
covariance → variogram or variogram → covariance
•If one can find find the variogram then one can transform back to the
covariance function.
•Need to be careful about how the variogram behaves close to zero
distance. The variogram estimates ρ + σ2 right at zero – not just ρ
• Great EDA tool – terrible for actually estimating parameters!
37. A variogram example
Sample field with true exponential variogram on a 25 × 25 grid.
0 2 4 6 8 10
0246810
3
2
1
0
−1
−2
Based on (625 choose 2 ) total pairs
≈ 200K differences: (yi − yj )2 /2
37