Spatial smoothing

Peter Ralph

8 March 2021 – Advanced Biological Statistics

Biketown, again

layout(t(1:2))
setup_map(main='starts'); points(bt_start, pch=20, cex=0.5, col='red')
setup_map(main='ends'); points(bt_end, pch=20, cex=0.5, col='blue')

plot of chunk r plot_pdx

Goals:

Visualize:

~~Starting and ending locations of trips.~~
~~Net flux of bikes by area (e.g., places where more trips start than end).~~
Distance of trips depending on start and end location.

What is mean local trip length?

bt$Duration <- as.numeric(as.difftime(bt$Duration, format="%X"))
hist(as.numeric(bt$Duration), breaks=200, xlab='ride durations in minutes')

plot of chunk r duration

Smoothing

The general idea:

Response equals smooth plus noise: \[ y_i \approx f(x_i), \] where

\(y_i\): response of \(i^\text{th}\) observation
\(x_i\): location in space of \(i^\text{th}\) observation
\(f( )\): a “smooth” function

To be determined: What’s “smooth”? What’s “plus noise”?

Our old friend, loess

ytsm <- loess(y ~ t, data=yt, span=0.1)
yt$smoothed <- predict(ytsm, newdata=yt)

plot of chunk r loess

… applied to these data:

lo_dur <- loess(Duration ~ Start_Latitude * Start_Longitude, data=bt, span=0.3)
pred_lats <- seq(min(bt$Start_Latitude), max(bt$Start_Latitude), length.out=101)
pred_lons <- seq(min(bt$Start_Longitude), max(bt$Start_Longitude), length.out=101)
pred_locs <- expand.grid(Start_Latitude=pred_lats,
                         Start_Longitude=pred_lons)
pred_duration <- predict(lo_dur, newdata=pred_locs)

… loess!?!?!

plot of chunk r plot_loess

Gaussian processes for smoothing

Response equals smooth plus noise: \[ y_i = f(x_i) + \epsilon_i, \] where

\(y_i\): response of \(i^\text{th}\) observation
\(x_i\): location in space of \(i^\text{th}\) observation
\(\epsilon_i \sim \Normal(0, \sigma^2_\epsilon)\): observation noise

and \(f(x)\) is spatially autocorrelated:

\[\begin{aligned} f(x_i) &\sim \MVN(\mu, \Sigma) \\ \cov[f(x_i), f(x_j)] &= \rho(\|x_i - x_j\|) . \end{aligned}\]

Also known as: Kriging

> help(Krig, package="fields")
Krig                  package:fields                   R Documentation

Kriging surface estimate

Description:

     Fits a surface to irregularly spaced data. The Kriging model
     assumes that the unknown function is a realization of a Gaussian
     random spatial processes. The assumed model is additive Y = P(x) +
     Z(X) + e, where P is a low order polynomial and Z is a mean zero,
     Gaussian stochastic process with a covariance that is unknown up
     to a scale constant. The main advantages of this function are the
     flexibility in specifying the covariance as an R language function
     and also the supporting functions plot, predict, predictSE,
     surface for subsequent analysis. Krig also supports a correlation
     model where the mean and marginal variances are supplied.

Usage:

     Krig(x, Y, cov.function = "stationary.cov", lambda = NA, df
                      = NA, GCV = FALSE, Z = NULL, cost = 1, knots = NA,
                      weights = NULL, m = 2, nstep.cv = 200, scale.type =
                      "user", x.center = rep(0, ncol(x)), x.scale = rep(1,
                      ncol(x)), rho = NA, sigma2 = NA, method = "REML",
                      verbose = FALSE, mean.obj = NA, sd.obj = NA,
                      null.function = "Krig.null.function", wght.function =
                      NULL, offset = 0, na.rm = TRUE, cov.args = NULL,
                      chol.args = NULL, null.args = NULL, wght.args = NULL,
                      W = NULL, give.warnings = TRUE, ...)

krig_dur <- fields::Krig(x=cbind(bt$Start_Latitude, bt$Start_Longitude), Y=bt$Duration)
pred_krig <- matrix(predict(krig_dur, x=pred_locs), nrow=length(pred_lats))

plot of chunk r plot_krig

More familiar tools

`gp( )`

In brms, we can use gp( ) to add a “Gaussian process” predictor:

brsm <- brm(y ~ gp(t), data=subset(yt, t > 5 & t < 8), control=list(adapt_delta=0.9))

## Compiling Stan program...

## Start sampling

pred_df <- data.frame(t=seq(4, 9, length.out=101))
pred_df <- cbind(pred_df, predict(brsm, newdata=pred_df))

plot of chunk r gp_brms_plot

Note: more predictors

A bivariate predictor is specified by:

    y ~ gp(x1, x2)

Question: what’s the difference to

    y ~ gp(x1) + gp(x2)

However…

Gaussian processes don’t scale well. This would take forever:

brsm <- brm(y ~ gp(t), data=yt)
yt$brms_gp <- predict(ytsm, newdata=yt)

Why? With \(n\) observations, the covariance matrix has \(n^2\) entries. Fitting the MVN requires inverting it.

Splines

You might already know about splines:

Bézier curves are piecewise polynomial, i.e., splines: curves with handles

Smoothing with splines:

Response equals smooth plus noise: \[ y_i = f(x_i) + \epsilon_i, \] where

\(y_i\): response of \(i^\text{th}\) observation
\(x_i\): location in space of \(i^\text{th}\) observation
\(\epsilon_i \sim \Normal(0, \sigma^2_\epsilon)\): observation noise

and \(f(x)\) is a \(k\)-th order spline with knots \(u_1, \ldots, u_m\): \[\begin{aligned} f(x) &= a_{j0} + a_{j1} x + a_{j2} x^2 + \cdots + a_{jk} x^k \\ \text{for} &\qquad u_j \le x \le u_{j+1} . \end{aligned}\] and ‘smooth’ at the knots.

Fitting a spline model:

Using brms, this is as easy as using s( ) as a predictor:

brm_spline <- brm(y ~ s(t), data=yt,
                  prior=c(set_prior("exponential(1000)", class='sigma')))

## Compiling Stan program...

## Start sampling

s_pred_df <- data.frame(t=seq(0, max(yt$t), length.out=501))

plot of chunk r plot_fit_s

Applied to biketown

downtown <- subset(bt, Start_Latitude > 45.51 & Start_Latitude < 45.53
                       & Start_Longitude > -122.69 & Start_Longitude < -122.67
                       & (rbinom(nrow(bt), size=1, prob=0.1) == 1))
s_dur <- brm(Duration ~ t2(Start_Latitude, Start_Longitude),
             data=downtown)

## Warning: Rows containing NAs were excluded from the model.

## Compiling Stan program...

## Start sampling

plot of chunk r plot_bt_s_plot

Conclusion

Built-in smoothing with ~ gp( ) and ~ s( ) and ~ t2( ) in brms is very promising.

But, it’s not magical (yet).

More with splines:

Recall that a Generalized Linear Model (GLM) has 1. a family describing the distribution of the response, \(Y\) 2. a linear predictor \[\begin{aligned} X \beta = \beta_0 + \beta_1 X_1 + \cdots + \beta_n X_n \end{aligned}\] 3. a link function connecting the two: \[\begin{aligned} h(\E[Y]) = X\beta . \end{aligned}\]

“Linear” \(\to\) “Additive”

A Generalized Additive Model (GAM) replaces the linear predictor with \[\begin{aligned} \beta_0 + f_1(X_1) + \cdots + f_n(X_n), \end{aligned}\] where each \(f_i( )\) is some smooth function, e.g., a spline.