Peter Ralph
Advanced Biological Statistics
Suppose we have estimates of abundance of a soil microbe from a number of samples across our study area:
(x,y)
: spatial coords; z
: abundance
## x y z
## 1 0.5766037 0.5863978 3.845720
## 2 0.2230729 0.2747410 3.571873
## 3 0.3318966 0.1476570 4.683332
## 4 0.7107246 0.8014103 3.410312
## 5 0.8194490 0.3864098 3.726504
## 6 0.4237206 0.8204507 3.356803
## 7 0.9635445 0.6849373 4.424564
## 8 0.9781304 0.8833893 5.795779
## 9 0.8405219 0.1119208 3.962872
## 10 0.9966112 0.7788340 5.155496
## 11 0.8659590 0.6210109 3.911921
## 12 0.7014217 0.2741515 4.235119
## 13 0.3904731 0.3014697 3.657133
## 14 0.3147697 0.3490438 3.342371
## 15 0.8459473 0.3291311 3.835436
## 16 0.1392785 0.8095039 4.035925
## 17 0.5181206 0.2744821 4.694328
## 18 0.5935508 0.9390949 3.054045
## 19 0.9424617 0.9022671 5.518488
## 20 0.6280196 0.1770531 4.617342
Goals:
(descriptive) What spatial scale does abundance vary over?
(predictive) What are the likely (range of) abundances at new locations?
Let’s brainstorm a few examples, listing
Everything is related to everything else, but near things are more related than distant things.
Modeler: Great, covariance is a decreasing function of distance.
A convenient choice: the covariance between two points distance \(d\) apart is \[\begin{aligned} \alpha^2 \exp\left(- \frac{1}{2}\left(\frac{d}{\rho}\right)^2 \right) . \end{aligned}\]
\(\alpha\) controls the overall variance (amount of noise)
\(\rho\) is the spatial scale that covariance decays over
~ gp( )
Here’s an R
function that takes a set of locations (xy
), a variance scaling alpha
, and a spatial scale rho
:
cov_exp_quad <- function (xy, alpha, rho) {
# return the 'quadratic exponential' covariance matrix
# for spatial positions xy
dxy <- as.matrix(dist(xy))
return( alpha^2 * exp( - (1/2) * dxy^2 / rho^2 ) )
}
Challenge: simulate spatially autocorrelated random Gaussian values, and plot them, in space. Pick parameters so you can tell they are autocorrelated.
(descriptive) What spatial scale does abundance vary over?
\(\Rightarrow\) What is \(\rho\)?
(predictive) What are the likely (range of) abundances at new locations?
\(\Rightarrow\) Predict at new locations.
N <- 20
xy <- data.frame(x=runif(N), y=runif(N))
dxy <- as.matrix(dist(xy))
ut <- upper.tri(dxy, diag=TRUE)
truth <- list(rho=.6,
delta=.1,
alpha=2.5,
mu=5)
truth$covmat <- (truth$delta * diag(N)
+ truth$alpha^2 * exp(-(dxy/truth$rho)^2))
xy$z <- as.vector(rmvnorm(1, mean=rep(truth$mu,N), sigma=truth$covmat))
Default looks okay:
## prior class coef group resp dpar nlpar bound source
## student_t(3, 2.6, 2.5) Intercept default
## (flat) lscale default
## inv_gamma(3.118793, 0.477592) lscale gpxy default
## student_t(3, 0, 2.5) sdgp default
## student_t(3, 0, 2.5) sdgp gpxy (vectorized)
## student_t(3, 0, 2.5) sigma default
6: There were 114 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.
## Warning: There were 8 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: z ~ gp(x, y)
## Data: xy (Number of observations: 20)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Gaussian Process Terms:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sdgp(gpxy) 2.06 0.86 1.06 4.29 1.00 1317 2095
## lscale(gpxy) 0.29 0.08 0.16 0.49 1.00 724 1364
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 3.95 1.10 1.81 6.40 1.00 1232 1546
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.38 0.14 0.17 0.70 1.00 416 487
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## truth Estimate l-95% CI u-95% CI
## alpha 2.5 2.0578919 1.0641066 4.2856477
## rho 0.6 0.2900958 0.1569962 0.4918624
## delta 0.1 0.3764278 0.1673808 0.7031219
## mu 5 3.9465881 1.8106834 6.4013225
(descriptive) What spatial scale does abundance vary over?
Values are correlated over distances of order \(\rho=2.06\) units of distance (95% CI of 4.29 to 4.29).
(predictive) What are the likely (range of) abundances at new locations?
These are
## x y Estimate Est.Error Q2.5 Q97.5
## 1 0.5043937 0.02017328 4.780138 1.1844345 2.485370 7.183125
## 2 0.4269982 0.59862324 2.059266 0.4867674 1.091640 3.029740
## 3 0.7352417 0.29087388 3.954424 0.5962007 2.776944 5.170327
## 4 0.3522130 0.72205000 2.194331 0.4675924 1.259577 3.148516
## 5 0.9759262 0.28120183 4.569030 0.7800587 2.988711 6.088379
## 6 0.8581954 0.03737987 6.786224 0.5491277 5.639697 7.806351
## 7 0.6778009 0.70130030 3.513884 0.6363478 2.194811 4.719990
## 8 0.8531732 0.27295228 4.550964 0.5707566 3.417724 5.716459
## 9 0.7005615 0.44442729 3.152045 0.6110838 1.956736 4.358987
## 10 0.4044683 0.75131957 2.490939 0.4752951 1.537279 3.457285