CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models and Analysis - Douglas Nychka, Oct 10, 2017

Spatial data:
models and analysis
Lecture 1.3
Douglas Nychka,
National Center for Atmospheric Research
Supported by the National Science Foundation May, 2017

Outline
2
• Additive statistical model
• Spatial Processes and covariance functions
• The multivariate normal distribution
• Kriging
• Kriging uncertainty
• Likelihood for spatial data
• f i e l d s

An additive statistical model
3

Spatial data
4
iy =
k
i,k k iZ d + g(x ) + Ei
or by vectors
y = Z d + g + E
Observation = fixed component + spatial process + error

Colorado M A M average tmin
−10
−5
0
5

Goals:
6
1. Prediction: Determine the climate at locations where there are
not stations.
2. Uncertainty: Quantify the error in the predictions.
3. Summarize the spatial/temp oral structure: One to ol for
comparing observations to models and models to models for physical
insight

Spatial processes
and covariance functions
7

Covariance functions g(x) is a random surface with E[g(x)] = µ(x)
8
Covariance function k has two spatial arguments
k(x1, x2 ) = COV (g(x1), g(x2))
• Usually µ(x) is constant or even zero.
• Often k only depends on distance of separation between locations.
Exponential
k(x1, x2 ) = ρe− distance(x1,x2)/θ
Isotropic
k(x1, x2 ) = ρΦ(distance(x1, x2 )/θ)

Families of correlation functions
Matern:
φ(d) = ρψν (d/θ)) with ψν a Bessel function.
ν = .5 Exponential, 1.0 , 2.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
distance
correlation
•θ a range parameter
•ν smoothness at 0.
•ψν is an exponential for ν =
1/2 as ν → ∞ Gaussian.
• As ν increases the process
9
is smoother.
Wendland:
Polynomial that is exactly zero outside given range.
Compactly supported Wendland covariance ( d = 2 , k = 3 )

W h a t do these processes look like?
Varying the smoothness:
Matern (.5) Matern(1.0)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
0.0 0.5 1.0 1.5 2.0 2.5 3.0
−2−1012
Matern (2.0) Wendland (2.0)
10

W h a t do these processes look like?
11
Varying the smoothness:
Matern (.5) Matern(1.0)
Exponential Matern1.0
Matern2.0 Wendland
−4
−2
0
2
4
Matern (2.0) Wendland (2.0)

Revisiting the additive model
12

Observations:
13
iy =
k
i,k k iZ d + g(x ) + Ei
or by vectors
y = Z d + g + E
Observation = fixed component + spatial process + error
• Ei’s are uncorrelated N (0, σ2)
Spatial process:
• g(x) mean zero Gaussian process V AR(g(x)) = ρ
• Covariance function ρkθ(., .) with θ some parameters.

A multivariate normal world
14
Covariance matrix of g at observations:
Kθ[i, j ] = kθ(xi, x j )
so
E(g(xi )g(xj )) = ρKθ[i, j ]
The observation model can be collapsed as multivariate nor-
mal
y ∼ M N (Zd, ρKθ + σ2 I)
g(x) given y and all parameters is multivariate Gaussian
Follows the standard form for the conditional distribution of a multivari-
ate Gaussian.

Some Background
15
Joint covariance matrix of (U,V) is Σ
Σ =
Σ u,u Σ u,v
Σ v,u Σ u,v
(1)
Recall if U and V are multivariate normal and E(U ) = E(V ) = 0.
u,u[V |U ] ∼ M N ( Σ v,u( Σ −1) U , Σ v,v v,u− Σ ( Σ −1
u,u u,v) Σ )

Kriging
Danie Gerhardus Krige
16

W h a t is Kriging?
17
•Assume that the covariance parameters are known or estimated by
variogram fitting or maximum likelihood.
• Find the parameters in the linear model by generalized least squares.
•With covariance and linear model parameters fixed, find the condi-
tional distribution for g given y
ˆg is the conditional expectation of g given y
e. g. ˆg(x) = E[g(x)|y]
Prediction standard errors for g based on conditional variances.

T h e Kriging backbone
18
We will use this result in many places
Suppose
• there is no fixed effect ( mean zero)
• prediction at the observations.
yi = g(xi) + ei
or conveniently
y = g + e

Recall
19
u,uΣ v,u( Σ −1) U , Σ v,v v,u− Σ ( Σ −1
u,u) Σ u,v )[V |U] ∼ M N (
V = g U = y
Σv,v = cov(g, g) = ρK Note: Ki , j = k(xi, x j )
Σu,v = cov(g, y ) = cov(g, g + e) = cov(g, g) = ρK
Σu,u = cov(y, y ) = ρK + σ2I
Conditional mean:
ˆg = ρK(ρK + σ2I)−1y = K ( K + (σ2/ρ)I)−1y
ˆg = K ( K + λI)− 1 y = ( I + λQ)−1y
This is a smoother! ( Q = K− 1 )

Uncertainty of the Kriging estimate
20
[V |U ] ∼ M N ( Σ v,u( Σ −1) U , Σ v,v v,u
−1
u,u u,u u,v− Σ ( Σ ) Σ )
•Evaluate the pointwise prediction for a limited number of locations
•conditional simulation Generate a Monte Carlo sample from the Mul-
tivariate normal distribution.

Where are the basis functions?
21
Prediction at a location x
u,uΣv,u(Σ−1 )U
V = g(x), U = y
ˆg(x) = (ρk(x, x1), ..., ρk(x, xn))T (ρK + σ2I)−1y
ˆg(x) =
n
k(x, x )ci i c = ( K + λI)− 1 y
i = 1
ˆg(x) = sum of basis functions × coefficients

Lecture on smoothers revisited
22
•ˆg(x) is type of spline using a specific roughness penalty
•ˆg( x ) is a penalized least squares estimate with a specific Q and a
specific set of basis functions.
• Q−1 = K and bi(x) = k(x, xi )
What about fixed effects? ( the d s)
Estimate by generalized least squares and subtract off before doing
Kriging.

Likelihood for statistical parameters
y ∼ M N (Zd, (ρK + σ2 I))
Likelihood
1
√
2
24
π
n e 2
1 T 2 −1− ( y− Zd) ( ρ K + σ I ) ( y− Zd) |ρK + σ |2 −1/2

log Likelihood
1
2
T− ( y − Z d) ( ρK 2 −1
+ σ I ) ( y
1
2
2− Z d ) − log(det(ρK + σ I ) ) + stuff
With λ = σ2/ρ
−
1 T( y − Z d) ( K −1 1 n
2ρ 2 2
25
+ λ I ) ( y − Z d) − log(det(K + λ I ) ) + − log( ρ) + stuff
• Can maximize this analytically for ρ and d
dˆ=
(
ZT M Z
− 1
ZT M y
ρˆ= ( 1 / n ) ( y − Zdˆ)T M ( y − Zdˆ)
where M = ( K + λI)− 1
• K may depend on other parameters e.g. scale, shape
• Once we find these parameters – we do Kriging for the predictions!

log Likelihood for example
●
●
●
●●
●●
●
●●●
●
●●
●●●●
●
● ●
●●
● ●
● ● ●● ●●
●
●● ●
●●●●● ●● ●● ●
● ●●
●● ● ●
●●
●●●● ●● ● ●
●● ●
●●
●●●●
● ●●
●
●●
●
● ●
●
●
●●
●
●
●●●
●
●
●●
●
● ●
● ● ●
●
●
●●●●●● ●●
●
● ● ●●● ● ●●
● ●●
●●
● ●● ●●
●
● ●●●● ●
●
●●●●●
●
●● ●
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
−15 −10 5 10
0.20.40.81.0
−5 0
log Lambda
theta
0.6
−1
−1
10
50
0
−5●
Large uncertainty in the range/scale parameter is typical.
26

Curve estimate with uncertainty
True function , Estimate, ˆg(x)
1.0
−0.20.20.61.0
●
●
●
●●●
●●
●
●
●
●●●
●●
●●
●
●
●
●●
●●
●
●
●● ●●
●●
● ● ●
●
●●
●
●
●●
●●
●
●
●
●
●
●●●
● ●● ●●●●
● ●
●●
●●●
●
●
●
●
● ●●
●
● ●●
●
●
●
● ●
●
●
● ●●
●
●●
●
● ●
●
● ●●●
●
● ●●●●● ●
●
●●
●
●● ●
●
●
●●
●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●●●
●●●
● ● ●
●
●
0.0 0.2 0.4 0.6 0.8
fixed part is linear, Matern covariance
smoothness = 2, θˆ= .98, σˆ = .08
27

Conditional simulation
20 draws from conditional multivariate normal
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.20.61.0
●
● ●●● ●● ●●● ● ●● ●
●● ●●●● ● ●● ● ●●●
●●●● ●● ●●●
●● ● ●
● ●● ●●●● ●
●● ● ●
●●
●● ●●
●●
● ●●●●
●●
●
●● ● ●●
● ● ● ●
● ●
● ●
●● ●
● ●
●●
● ●
●●●●●●● ● ●●● ●● ● ● ●●● ●
●● ● ●● ● ●
● ● ●
● ● ●
●● ●● ●●●●
●● ●●● ●●●
●● ●●●
●●
x
y
28

Inference for the maximum
●
●
●
●
● ●
●
●
●
●●
●● ●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
● ●
●
●
●
●
●
0.1 0.2 0.4 0.5
0.50.71.1
0.3
x
y
0.9
++ ++
+ + +
+ +
+ ++● + ++ +++
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
● ● ● ●
●
●●●
●
●
●
●
●
● ●
●
● ●
●●
●
●
●
●●●●●
● ●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
● ●
●●
●●●●
●
● ● ●
●
●
● ●
●
● ●●
●
●
●
●●
●
● ● ● ●● ●
●●
● ●
●
●
●
●●●●●●
●
●●●●●●●● ●
●
●
●● ● ●●
●● ●
●●
●
●
●
●
●
●
●●
●● ● ●●
●● ●
●●● ● ● ●
● ●●
●●●●●●● ●
● ●●●●●●
● ● ● ●●● ●
●
● ●● ●●
●
● ●● ●
0.20 0.30 0.35
0.90
0.25
max X
29
maxY
0.920.940.960.981.00

spatialProcess
30
# x and y a r e t h e s y n t h e t i c example
l i b r a r y ( f i e l d s )
obj<- s p a t i a l P r o c e s s ( x , y )

In R
31
C a l l :
spatialProcess(x = x , y = y)
Number of Observations: 150
Degree of polynomial n u l l space ( base model): 1
Total number of parameters i n base model 2
E f f . degrees of freedom 17.04
Standard Error of e st i m a te: 0.9162
Smoothing parameter 0.02295
MLE sigma 0.0798
MLE rho 0.2775
MLE lambda = MLE sigma^2 / MLE rho 0.02295
MLE t h e t a 0.432
Nonzero e n t r i e s i n covariance 22500
Covariance Model: st a t io nar y.c ov
Covariance f u n c t ion : Matern
Non-default covariance arguments and t h e i r values
Argument: Covariance has the v a l u e ( s ) :
[ 1 ] "Matern"
Argument: smoothness has the v a l u e ( s ) :
has the v a l u e ( s ) :
[ 1 ] 1
Argument: t h e t a
t h e t a
0.4320242

In R – C o l o r a d o C l i m a t e
32
library( fields)
# colorado climate data
data(COmonthlyMet)
x<- CO.loc
y<- CO.tmin.MAM.climate
elev<- CO.elev
good<- !is.na( y)
x<- x[good,]
y<- y[good]
elev<- elev[good]
NGRID <- 50
# get elevations on a grid (will use these
later)
# lonRange<- range(x[,1])
# latRange<- range(x[,2])
# COGrid<- list(x=seq(lonRange[1],
lonRange[2],length.out=NGRID ),
# y=seq(latRange[1],
latRange[2],length.out=NGRID )
# )
COGrid<- list(x = seq( -109, -101,,NGRID),
y = seq(36.5, 41.5,,NGRID))
COGridPoints<- make.surface.grid( COGrid )
data( RMelevation)
COElevGrid<- interp.surface.grid( RMelevation,
COGrid )
# Native Res: COElevGrid<- crop.image(
RMelevation, x )
# take a look at the data
quilt.plot( x, y)
US( add=TRUE)
plot( elev, y)
plot( x[,2], y)
X<- cbind( x, elev)
#
lmObj<- lm( y ~ lon+lat +elev, data=X
)
summary( lmObj)
quilt.plot( x, lmObj$residuals)
US( add=TRUE)
fit0E<- Tps( x,y, Z=elev- mean(elev))
# fit a Kriging estimator Matern
covariance smoothness =1.0
# range and nugget estimated by
maxmimum likelihood
fit1<- spatialProcess( x,y)
set.panel(2,2)
plot( fit1)
surface( fit1)
US( add=TRUE)

33
# take a look at residuals
plot( elev, fit1$residuals)
lmObj<- lm( fit1$residuals ~ elev)
abline( lmObj, col="red", lwd=3)
# with elevations
fit1E<- spatialProcess( x,y, Z= elev)
# the clean way for fields >=9.0
sur0<- predictSurface( fit1E,
grid.list=COGrid, ZGrid=COElevGrid )
# use this for fields <= 9.0
sur0<- predict( fit1E, x=COGridPoints,
Z=c(COElevGrid$z) )
sur0<- as.surface( COGrid, sur0)
#
image.plot( sur0, col=terrain.colors(256))
sur0Smooth<- predictSurface( fit1E,
grid.list=COGrid, drop.Z= TRUE)
image.plot( sur0Smooth)
# uncertainty 40 draws from posterior
distribution
set.seed(123)
# next commmand takes a minute or so
SEout<- sim.spatialProcess( fit1E,
xp=COGridPoints,
Z= c(COElevGrid$z), M= 40, drop.Z=TRUE)
set.panel( 3,3)
image.plot( sur0Smooth,
zlim=c(3.5,14.5), col=tim.colors(256))
contour( sur0Smooth, levels= 9, lwd=3,
col="grey", add=TRUE)
par( mar=c(3,3,1,1))
for( k in 1:8){
image( as.surface( COGrid,
SEout[k,]),
zlim
=c(3.5,14.5),col=tim.colors(256),
axes=FALSE)
contour( as.surface( COGrid,
SEout[k,]),
lwd=3, col="grey",level=9, add=TRUE)
title( k, adj=0, cex=2)
}
surSE<- apply( SEout, 2, sd )
image.plot( as.surface( COGrid,
surSE))
points( x, col="magenta", pch=16)
contour( COElevGrid,
level= c(2000, 3000), add= TRUE,
col="grey30", lwd=3)
US( add=TRUE, col="grey", lwd=2)
In R – C o l o r a d o C l i m a t e ( c o n t . )

xgrid<-seq( 0 , 1 , length.out=200)
f h a t < - p r e d i c t ( o b j , x g r i d )
p l o t ( obj$x, obj$y)
l i n e s ( x g r i d , f h a t , col="red")
●
●
●
●●●
●●
●
●
●●●
●
●●
●●
●
●
●●
● ●
●
●
●●
●
● ●
●
●
●
● ● ● ●
●
●● ●
● ●
●
●
●
●●●
● ● ● ●●●●
● ●
● ●
●● ●●
●
●
●● ●
●
● ●●
●
●
● ●●
●
●
● ●
● ●
●
●
●●
● ●
●
●
● ●●
●
●
●●●●● ●
●●
●
● ● ●
●
●
●
● ●●● ● ●●
●
●●
● ●
●
●
●●
● ●
●
● ●
●
● ●●●
● ●
● ● ●●
●● ●
●
0.0 0.2 0.4 0.6 0.8 1.0
−0.20.00.20.40.60.81.0
obj$x
obj$y
34

V ariogram as an E D A to ol
How to estimate the spatial from a single field?
• Plot for a spatial data set or spatial field plot
( y i −y j ) 2
2
36
against the
distance of separation. ” O n the average” this should be follow the
theoretical curve that is the variance of the data minus the covariance
function .
covariance → variogram or variogram → covariance
•If one can find find the variogram then one can transform back to the
covariance function.
•Need to be careful about how the variogram behaves close to zero
distance. The variogram estimates ρ + σ2 right at zero – not just ρ
• Great EDA tool – terrible for actually estimating parameters!

A variogram example
Sample field with true exponential variogram on a 25 × 25 grid.
0 2 4 6 8 10
0246810
3
2
1
0
−1
−2
Based on (625 choose 2 ) total pairs
≈ 200K differences: (yi − yj )2 /2
37

Houston, we have a problem.
36

Improving the variogram
Binning observations by distance ranges, finding box plots and adding
the mean for each bin.
0.00.51.01.52.02.53.0
variogram
0 2 4 6 8 10 12 14
distance
●
39
●
●
●
●
●●●●●●● ●●●
●●
●●
●
●
●●
●
●
●●
●
●

Identifying the nugget σ2
Recall the additive model:
yi = g(xi) + Ei
Correlations among the observations due to the smooth field but the
measurement error is uncorrelated. Adding measurement error to the
example (σ = .4, σ2 = .16)
0 2 4 6 8 10
0246810
−2
−1
0
1
2
3
0.00.51.01.52.0
distance
variogram
0 2 4 6 8 10 12 14
●
●
●
●
●
●
●
● ● ● ● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
40

CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models and Analysis - Douglas Nychka, Oct 10, 2017

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models and Analysis - Douglas Nychka, Oct 10, 2017

Similaire à CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models and Analysis - Douglas Nychka, Oct 10, 2017 (20)

Plus de The Statistical and Applied Mathematical Sciences Institute

Plus de The Statistical and Applied Mathematical Sciences Institute (20)

Dernier

Dernier (20)

CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models and Analysis - Douglas Nychka, Oct 10, 2017