\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\MVN}{MVN} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Spatial smoothing

Peter Ralph

8 March 2021 – Advanced Biological Statistics

Biketown, again

layout(t(1:2))
setup_map(main='starts'); points(bt_start, pch=20, cex=0.5, col='red')
setup_map(main='ends'); points(bt_end, pch=20, cex=0.5, col='blue')

plot of chunk r plot_pdx

Goals:

Visualize:

  1. Starting and ending locations of trips.
  2. Net flux of bikes by area (e.g., places where more trips start than end).
  3. Distance of trips depending on start and end location.

What is mean local trip length?

bt$Duration <- as.numeric(as.difftime(bt$Duration, format="%X"))
hist(as.numeric(bt$Duration), breaks=200, xlab='ride durations in minutes')

plot of chunk r duration

Smoothing

The general idea:

Response equals smooth plus noise: \[ y_i \approx f(x_i), \] where

  • \(y_i\): response of \(i^\text{th}\) observation
  • \(x_i\): location in space of \(i^\text{th}\) observation
  • \(f( )\): a “smooth” function

To be determined: What’s “smooth”? What’s “plus noise”?

Our old friend, loess

ytsm <- loess(y ~ t, data=yt, span=0.1)
yt$smoothed <- predict(ytsm, newdata=yt)

plot of chunk r loess

… applied to these data:

lo_dur <- loess(Duration ~ Start_Latitude * Start_Longitude, data=bt, span=0.3)
pred_lats <- seq(min(bt$Start_Latitude), max(bt$Start_Latitude), length.out=101)
pred_lons <- seq(min(bt$Start_Longitude), max(bt$Start_Longitude), length.out=101)
pred_locs <- expand.grid(Start_Latitude=pred_lats,
                         Start_Longitude=pred_lons)
pred_duration <- predict(lo_dur, newdata=pred_locs)

… loess!?!?!

plot of chunk r plot_loess

Gaussian processes for smoothing

Response equals smooth plus noise: \[ y_i = f(x_i) + \epsilon_i, \] where

  • \(y_i\): response of \(i^\text{th}\) observation
  • \(x_i\): location in space of \(i^\text{th}\) observation
  • \(\epsilon_i \sim \Normal(0, \sigma^2_\epsilon)\): observation noise

and \(f(x)\) is spatially autocorrelated:

\[\begin{aligned} f(x_i) &\sim \MVN(\mu, \Sigma) \\ \cov[f(x_i), f(x_j)] &= \rho(\|x_i - x_j\|) . \end{aligned}\]

Also known as: Kriging

> help(Krig, package="fields")
Krig                  package:fields                   R Documentation

Kriging surface estimate

Description:

     Fits a surface to irregularly spaced data. The Kriging model
     assumes that the unknown function is a realization of a Gaussian
     random spatial processes. The assumed model is additive Y = P(x) +
     Z(X) + e, where P is a low order polynomial and Z is a mean zero,
     Gaussian stochastic process with a covariance that is unknown up
     to a scale constant. The main advantages of this function are the
     flexibility in specifying the covariance as an R language function
     and also the supporting functions plot, predict, predictSE,
     surface for subsequent analysis. Krig also supports a correlation
     model where the mean and marginal variances are supplied.

Usage:

     Krig(x, Y, cov.function = "stationary.cov", lambda = NA, df
                      = NA, GCV = FALSE, Z = NULL, cost = 1, knots = NA,
                      weights = NULL, m = 2, nstep.cv = 200, scale.type =
                      "user", x.center = rep(0, ncol(x)), x.scale = rep(1,
                      ncol(x)), rho = NA, sigma2 = NA, method = "REML",
                      verbose = FALSE, mean.obj = NA, sd.obj = NA,
                      null.function = "Krig.null.function", wght.function =
                      NULL, offset = 0, na.rm = TRUE, cov.args = NULL,
                      chol.args = NULL, null.args = NULL, wght.args = NULL,
                      W = NULL, give.warnings = TRUE, ...)
krig_dur <- fields::Krig(x=cbind(bt$Start_Latitude, bt$Start_Longitude), Y=bt$Duration)
pred_krig <- matrix(predict(krig_dur, x=pred_locs), nrow=length(pred_lats))

plot of chunk r plot_krig

More familiar tools

gp( )

In brms, we can use gp( ) to add a “Gaussian process” predictor:

brsm <- brm(y ~ gp(t), data=subset(yt, t > 5 & t < 8), control=list(adapt_delta=0.9))
## Compiling Stan program...
## Start sampling
pred_df <- data.frame(t=seq(4, 9, length.out=101))
pred_df <- cbind(pred_df, predict(brsm, newdata=pred_df))

plot of chunk r gp_brms_plot

Note: more predictors

A bivariate predictor is specified by:

    y ~ gp(x1, x2)

Question: what’s the difference to

    y ~ gp(x1) + gp(x2)

?

However…

Gaussian processes don’t scale well. This would take forever:

brsm <- brm(y ~ gp(t), data=yt)
yt$brms_gp <- predict(ytsm, newdata=yt)

Why? With \(n\) observations, the covariance matrix has \(n^2\) entries. Fitting the MVN requires inverting it.

Splines

You might already know about splines:

Bézier curves are piecewise polynomial, i.e., splines: curves with handles

Smoothing with splines:

Response equals smooth plus noise: \[ y_i = f(x_i) + \epsilon_i, \] where

  • \(y_i\): response of \(i^\text{th}\) observation
  • \(x_i\): location in space of \(i^\text{th}\) observation
  • \(\epsilon_i \sim \Normal(0, \sigma^2_\epsilon)\): observation noise

and \(f(x)\) is a \(k\)-th order spline with knots \(u_1, \ldots, u_m\): \[\begin{aligned} f(x) &= a_{j0} + a_{j1} x + a_{j2} x^2 + \cdots + a_{jk} x^k \\ \text{for} &\qquad u_j \le x \le u_{j+1} . \end{aligned}\] and ‘smooth’ at the knots.

Fitting a spline model:

Using brms, this is as easy as using s( ) as a predictor:

brm_spline <- brm(y ~ s(t), data=yt,
                  prior=c(set_prior("exponential(1000)", class='sigma')))
## Compiling Stan program...
## Start sampling
s_pred_df <- data.frame(t=seq(0, max(yt$t), length.out=501))

plot of chunk r plot_fit_s

Applied to biketown

downtown <- subset(bt, Start_Latitude > 45.51 & Start_Latitude < 45.53
                       & Start_Longitude > -122.69 & Start_Longitude < -122.67
                       & (rbinom(nrow(bt), size=1, prob=0.1) == 1))
s_dur <- brm(Duration ~ t2(Start_Latitude, Start_Longitude),
             data=downtown)
## Warning: Rows containing NAs were excluded from the model.
## Compiling Stan program...
## Start sampling

plot of chunk r plot_bt_s_plot

Conclusion

Built-in smoothing with ~ gp( ) and ~ s( ) and ~ t2( ) in brms is very promising.

But, it’s not magical (yet).

More with splines:

Recall that a Generalized Linear Model (GLM) has 1. a family describing the distribution of the response, \(Y\) 2. a linear predictor \[\begin{aligned} X \beta = \beta_0 + \beta_1 X_1 + \cdots + \beta_n X_n \end{aligned}\] 3. a link function connecting the two: \[\begin{aligned} h(\E[Y]) = X\beta . \end{aligned}\]

“Linear” \(\to\) “Additive”

A Generalized Additive Model (GAM) replaces the linear predictor with \[\begin{aligned} \beta_0 + f_1(X_1) + \cdots + f_n(X_n), \end{aligned}\] where each \(f_i( )\) is some smooth function, e.g., a spline.

// reveal.js plugins