Spatial autocorrelation

Peter Ralph

Advanced Biological Statistics

Spatial models

A simple scenario

Suppose we have estimates of abundance of a soil microbe from a number of samples across our study area:

plot of chunk r show_study

The data

(x,y) : spatial coords; z : abundance

xy

##            x         y        z
## 1  0.5766037 0.5863978 3.845720
## 2  0.2230729 0.2747410 3.571873
## 3  0.3318966 0.1476570 4.683332
## 4  0.7107246 0.8014103 3.410312
## 5  0.8194490 0.3864098 3.726504
## 6  0.4237206 0.8204507 3.356803
## 7  0.9635445 0.6849373 4.424564
## 8  0.9781304 0.8833893 5.795779
## 9  0.8405219 0.1119208 3.962872
## 10 0.9966112 0.7788340 5.155496
## 11 0.8659590 0.6210109 3.911921
## 12 0.7014217 0.2741515 4.235119
## 13 0.3904731 0.3014697 3.657133
## 14 0.3147697 0.3490438 3.342371
## 15 0.8459473 0.3291311 3.835436
## 16 0.1392785 0.8095039 4.035925
## 17 0.5181206 0.2744821 4.694328
## 18 0.5935508 0.9390949 3.054045
## 19 0.9424617 0.9022671 5.518488
## 20 0.6280196 0.1770531 4.617342

Goals:

(descriptive) What spatial scale does abundance vary over?
(predictive) What are the likely (range of) abundances at new locations?

Exercise:

Let’s brainstorm a few examples, listing

a spatially autocorrelated variable
the spatial scale of autocorrelation

Spatial covariance

Tobler’s First Law of Geography:

Everything is related to everything else, but near things are more related than distant things.

Modeler: Great, covariance is a decreasing function of distance.

A decreasing function of distance.

A convenient choice: the covariance between two points distance \(d\) apart is \[\begin{aligned} \alpha^2 \exp\left(- \frac{1}{2}\left(\frac{d}{\rho}\right)^2 \right) . \end{aligned}\]

\(\alpha\) controls the overall variance (amount of noise)
\(\rho\) is the spatial scale that covariance decays over

In Stan

In brms: `~ gp( )`

Here’s an R function that takes a set of locations (xy), a variance scaling alpha, and a spatial scale rho:

cov_exp_quad <- function (xy, alpha, rho) {
    # return the 'quadratic exponential' covariance matrix
    # for spatial positions xy
    dxy <- as.matrix(dist(xy))
    return( alpha^2 * exp( - (1/2) * dxy^2 / rho^2 ) )
}

Challenge: simulate spatially autocorrelated random Gaussian values, and plot them, in space. Pick parameters so you can tell they are autocorrelated.

Hint: library(mvtnorm) and

    # to color points by a continuous value:
    colorRampPalette(c('blue', 'red'))(24)[cut(xy$z, breaks=24)]

Fitting a model

Goals

(descriptive) What spatial scale does abundance vary over?

\(\Rightarrow\) What is \(\rho\)?
(predictive) What are the likely (range of) abundances at new locations?

\(\Rightarrow\) Predict at new locations.

Simulate data

N <- 20
xy <- data.frame(x=runif(N), y=runif(N))
dxy <- as.matrix(dist(xy))
ut <- upper.tri(dxy, diag=TRUE)
truth <- list(rho=.6,
              delta=.1,
              alpha=2.5,
              mu=5)
truth$covmat <- (truth$delta * diag(N) 
                 + truth$alpha^2 * exp(-(dxy/truth$rho)^2))
xy$z <- as.vector(rmvnorm(1, mean=rep(truth$mu,N), sigma=truth$covmat))

Pick priors?

Default looks okay:

get_prior(z ~ gp(x, y), data=xy)

##                          prior     class coef group resp dpar nlpar bound       source
##         student_t(3, 2.6, 2.5) Intercept                                       default
##                         (flat)    lscale                                       default
##  inv_gamma(3.118793, 0.477592)    lscale gpxy                                  default
##           student_t(3, 0, 2.5)      sdgp                                       default
##           student_t(3, 0, 2.5)      sdgp gpxy                             (vectorized)
##           student_t(3, 0, 2.5)     sigma                                       default

Fit the model

sp_fit <- brm(
    z ~ gp(x, y),
    data=xy,
    file="cache/sp_fit.rds"
)

Convergence issues

6: There were 114 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.

mcmc_pairs(sp_fit, pars=c("lscale_gpxy", "sigma", "lp__"), np=nuts_params(sp_fit))

plot of chunk r plot_conv

Try again

sp_fit <- brm(
    z ~ gp(x, y),
    data=xy,
    file="cache/sp_fit2.rds",
    control=list(adapt_delta=0.95)
)

sp_fit

## Warning: There were 8 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: z ~ gp(x, y) 
##    Data: xy (Number of observations: 20) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Gaussian Process Terms: 
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sdgp(gpxy)       2.06      0.86     1.06     4.29 1.00     1317     2095
## lscale(gpxy)     0.29      0.08     0.16     0.49 1.00      724     1364
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     3.95      1.10     1.81     6.40 1.00     1232     1546
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.38      0.14     0.17     0.70 1.00      416      487
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Does it work?

##       truth  Estimate  l-95% CI  u-95% CI
## alpha   2.5 2.0578919 1.0641066 4.2856477
## rho     0.6 0.2900958 0.1569962 0.4918624
## delta   0.1 0.3764278 0.1673808 0.7031219
## mu        5 3.9465881 1.8106834 6.4013225

plot of chunk r plot_pts_interp

Conclusions

(descriptive) What spatial scale does abundance vary over?

Values are correlated over distances of order \(\rho=2.06\) units of distance (95% CI of 4.29 to 4.29).
(predictive) What are the likely (range of) abundances at new locations?

These are

##            x          y Estimate Est.Error     Q2.5    Q97.5
## 1  0.5043937 0.02017328 4.780138 1.1844345 2.485370 7.183125
## 2  0.4269982 0.59862324 2.059266 0.4867674 1.091640 3.029740
## 3  0.7352417 0.29087388 3.954424 0.5962007 2.776944 5.170327
## 4  0.3522130 0.72205000 2.194331 0.4675924 1.259577 3.148516
## 5  0.9759262 0.28120183 4.569030 0.7800587 2.988711 6.088379
## 6  0.8581954 0.03737987 6.786224 0.5491277 5.639697 7.806351
## 7  0.6778009 0.70130030 3.513884 0.6363478 2.194811 4.719990
## 8  0.8531732 0.27295228 4.550964 0.5707566 3.417724 5.716459
## 9  0.7005615 0.44442729 3.152045 0.6110838 1.956736 4.358987
## 10 0.4044683 0.75131957 2.490939 0.4752951 1.537279 3.457285

Interpolating to a grid

grid_xy <- expand.grid(x=seq(0,1,length.out=11), y=seq(0,1,length.out=11))
grid_pred <- predict(sp_fit, newdata=grid_xy)

plot of chunk r plot_interp