Peter Ralph
5 March 2020 – Advanced Biological Statistics
Suppose we have estimates of abundance of a soil microbe from a number of samples across our study area:
(x,y)
: spatial coords; z
: abundance
## x y z
## 1 0.5766037 0.5863978 3.845720
## 2 0.2230729 0.2747410 3.571873
## 3 0.3318966 0.1476570 4.683332
## 4 0.7107246 0.8014103 3.410312
## 5 0.8194490 0.3864098 3.726504
## 6 0.4237206 0.8204507 3.356803
## 7 0.9635445 0.6849373 4.424564
## 8 0.9781304 0.8833893 5.795779
## 9 0.8405219 0.1119208 3.962872
## 10 0.9966112 0.7788340 5.155496
## 11 0.8659590 0.6210109 3.911921
## 12 0.7014217 0.2741515 4.235119
## 13 0.3904731 0.3014697 3.657133
## 14 0.3147697 0.3490438 3.342371
## 15 0.8459473 0.3291311 3.835436
## 16 0.1392785 0.8095039 4.035925
## 17 0.5181206 0.2744821 4.694328
## 18 0.5935508 0.9390949 3.054045
## 19 0.9424617 0.9022671 5.518488
## 20 0.6280196 0.1770531 4.617342
Goals:
(descriptive) What spatial scale does abundance vary over?
(predictive) What are the likely (range of) abundances at new locations?
Everything is related to everything else, but near things are more related than distant things.
Modeler: Great, covariance is a decreasing function of distance.
A convenient choice: the covariance between two points distance \(d\) apart is \[\begin{aligned} \alpha^2 \exp\left(- \frac{1}{2}\left(\frac{d}{\rho}\right)^2 \right) . \end{aligned}\]
\(\alpha\) controls the overall variance (amount of noise)
\(\rho\) is the spatial scale that covariance decays over
Here’s an R
function that takes a set of locations (xy
), a variance scaling alpha
, and a spatial scale rho
:
cov_exp_quad <- function (xy, alpha, rho) {
# return the 'quadratic exponential' covariance matrix
# for spatial positions xy
dxy <- as.matrix(dist(xy))
return( alpha^2 * exp( - (1/2) * dxy^2 / rho^2 ) )
}
Challenge: simulate spatially autocorrelated random Gaussian values, and plot them, in space. Pick parameters so you can tell they are autocorrelated.
to color points by a continuous value:
colorRampPalette(c('blue', 'red'))(24)[cut(xy$z, breaks=24)]
(descriptive) What spatial scale does abundance vary over?
\(\Rightarrow\) What is \(\rho\)?
(predictive) What are the likely (range of) abundances at new locations?
\(\Rightarrow\) Add unobserved abundances as parameters.
sp_block <- "
data {
int N; // number of obs
vector[2] xy[N]; // spatial pos
vector[N] z;
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
}
model {
matrix[N, N] K;
K = cov_exp_quad(xy, alpha, rho);
z ~ multi_normal(rep_vector(0.0, N), K);
alpha ~ normal(0, 5);
rho ~ normal(0, 5);
}
"
# check this compiles
sp_model <- stan_model(model_code=sp_block)
Challenge: we would like to estimate the abundance at the k
locations new_xy
. Add this feature to the Stan block.
data {
int N; // number of obs
vector[2] xy[N]; // spatial pos
vector[N] z;
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
}
model {
matrix[N, N] K;
K = cov_exp_quad(xy, alpha, rho);
z ~ multi_normal(rep_vector(0.0, N), K);
alpha ~ normal(0, 5);
rho ~ normal(0, 5);
}
sp_block <- "
data {
int N; // number of obs
vector[2] old_xy[N]; // spatial pos
vector[N] old_z;
int n;
vector[2] new_xy[n]; // new locs
}
transformed data {
vector[2] xy[N+n];
xy[1:N] = old_xy;
xy[(N+1):(N+n)] = new_xy;
print(dims(old_z));
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
vector[n] new_z;
real<lower=0> delta;
real mu;
}
model {
matrix[N+n, N+n] K;
vector[N+n] z;
K = cov_exp_quad(xy, alpha, rho);
for (k in 1:(N+n)) {
K[k,k] += delta;
}
z[1:N] = old_z;
z[(N+1):(N+n)] = new_z;
z ~ multi_normal(rep_vector(mu, N+n), K);
alpha ~ normal(0, 5);
rho ~ normal(0, 5);
delta ~ exponential(50);
mu ~ normal(0, 5);
}
"
new_sp <- stan_model(model_code=sp_block)
library(mvtnorm)
N <- 20
xy <- data.frame(x=runif(N), y=runif(N))
dxy <- as.matrix(dist(xy))
ut <- upper.tri(dxy, diag=TRUE)
truth <- list(rho=.6,
delta=.1,
alpha=2.5,
mu=5)
truth$covmat <- (truth$delta * diag(N)
+ truth$alpha^2 * exp(-(dxy/truth$rho)^2))
xy$z <- as.vector(rmvnorm(1, mean=rep(truth$mu,N), sigma=truth$covmat))
new_xy <- cbind(x=runif(5), y=runif(5))
sp_data <- list(N=nrow(xy),
old_xy=cbind(xy$x, xy$y),
old_z=xy$z,
n=nrow(new_xy),
new_xy=as.matrix(new_xy))
(sp_time <- system.time(
sp_fit <- sampling(new_sp,
data=sp_data,
iter=1000,
chains=2,
control=list(adapt_delta=0.99,
max_treedepth=12))))
## [20,1]
## user system elapsed
## 3.029 0.463 2.027
cbind(truth=truth[c("alpha", "rho", "delta", "mu")],
rstan::summary(sp_fit, pars=c("alpha", "rho", "delta", "mu"))$summary)
## truth mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 2.5 3.001466 0.0546597 1.119758 1.619972 2.208622 2.711442 3.497171 6.008417 419.6756 1.001433
## rho 0.6 0.4176488 0.003090324 0.07535053 0.2868524 0.3675073 0.4086849 0.4644004 0.584858 594.5174 1.002526
## delta 0.1 0.08281431 0.000981758 0.02605716 0.04365525 0.06383473 0.07928945 0.09693593 0.1478157 704.4417 1.000333
## mu 5 5.292854 0.07023122 1.675622 2.044147 4.297472 5.235879 6.231071 8.671262 569.2354 0.9995643
(descriptive) What spatial scale does abundance vary over?
Values are correlated over distances of order \(\rho=0.4176488\) units of distance.
(predictive) What are the likely (range of) abundances at new locations?
These are
## x y mean sd
## new_z[1] 0.5092173 0.81697839 4.722624 0.5928222
## new_z[2] 0.4682500 0.69922907 4.671481 0.5459556
## new_z[3] 0.1264559 0.18936180 6.467328 0.5258127
## new_z[4] 0.4797258 0.25212101 3.272170 0.3233269
## new_z[5] 0.7559952 0.07650894 4.219259 0.3351367
grid_xy <- expand.grid(x=seq(0,1,length.out=11), y=seq(0,1,length.out=11))
grid_data <- list(N=nrow(xy),
old_xy=cbind(xy$x, xy$y),
old_z=xy$z,
n=nrow(grid_xy),
new_xy=as.matrix(grid_xy))
(grid_time <- system.time(
grid_fit <- sampling(new_sp, data=grid_data,
iter=1000, control=list(adapt_delta=0.95))))
## [20,1]
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## user system elapsed
## 1496.377 1.827 505.160