\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Working with spatial data

Peter Ralph

10 March 2019 – Advanced Biological Statistics

An example

Biketown

git clone git@github.com:UO-Biostats/datasets.git
Albert in a bike (catmapper:Wikimedia)
Albert in a bike (catmapper:Wikimedia)

catmapper: Wikimedia

biketown bikes (Steve Morgan: Wikimedia)
biketown bikes (Steve Morgan: Wikimedia)

Steve Morgan: Wikimedia

how biketown works: screenshot
how biketown works: screenshot

August, 2018

## [1] 38587

plot of chunk biketown_it

Goals:

Visualize:

  1. Starting and ending locations of trips.
  2. Net flux of bikes by area (e.g., places where more trips start than end).
  3. Distance of trips depending on start and end location.

Making nice maps

See this tutorial by Manual Gimond.

How I got the mapping data

## OGR data source with driver: ESRI Shapefile 
## Source: "/home/peter/teaching/uo-biostats/datasets/biketown/River_Overlay", layer: "River_Overlay_Boundary"
## with 7 features
## It has 6 fields
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/peter/teaching/uo-biostats/datasets/biketown/bicycle_network", layer: "Bicycle_Network"
## with 39798 features
## It has 10 fields
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/peter/teaching/uo-biostats/datasets/biketown/pdx_streets", layer: "Streets"
## with 108454 features
## It has 39 fields

How I got the mapping data, continued

How you can get the mapping data

plot of chunk plot_river

plot of chunk plot_pdx

Spatial density estimation

Kernel density estimation

Suppose we have a bunch of points

\[ (x_1, y_1), \ldots, (x_n, y_n) \]

with density \(f(x,y)\), i.e.,

\[ f(x,y) \approx \frac{\#(\text{of points within $r$ of $(x,y)$})}{\pi r^2} \]

Now using

\[ \rho_\sigma(x,y) = \frac{1}{2\pi\sigma^2} e^{-\frac{x^2+y^2}{2\sigma^2}} \]

we can estimate \(f(x,y)\) by

\[ \hat f(x,y) = \sum_{i=1}^n \rho_\sigma(x_i-x, y_i-y) . \]

The two choices of kernel density estimation

  1. The kernel, \(\rho()\).

    (we used a Gaussian)

  2. The bandwidth, \(\sigma\).

Your turn

  1. Simulate 200 spatial locations with a density having two “bumps”. Plot these points. (rnorm(n, mean=-3), rnorm(n, mean=+3))

  2. Make a \(20 \times 20\) grid of “reference” locations. (expand.grid( ))

  3. Compute the kernel density estimate for each location, with \(\sigma\) chosen appropriately, and plot it.

Bike trip density estimation

bkde2D

## List of 3
##  $ x1  : num [1:301] -123 -123 -123 -123 -123 ...
##  $ x2  : num [1:301] 45.5 45.5 45.5 45.5 45.5 ...
##  $ fhat: num [1:301, 1:301] 0 0 0 0 0 ...

Plotting surfaces, method 1

image( ) and contour( )

plot of chunk plot_surface1x

Plotting surfaces, method 2

plot of chunk plot_surface2x

Choosing a bandwidth

The bandwidth is important

plot of chunk kernsmooth

How to pick a bandwidth?

There are various automatic methods.

But… crossvalidation!

For each bandwidth:

  1. Fit using most of the data,
  2. and predict the remaining data.
  3. Do this a bunch and return an estimate of goodness-of-fit.

… but wait, what’s this mean, here?

Revised:

For each bandwidth:

  1. Fit the smooth using most of the data,
  2. and predict the density at the locations of the remaining data,
  3. as well as the density at uniformly sampled “background” points.
  4. Return the mean density at “true” points minus the mean density at “background” points.

On example data

plot of chunk do_it

Prediction, by linear interpolation

The results

plot of chunk plot_xvalA

Prediction, on bicycle data

plot of chunk one_xval

plot of chunk do_xval

Bicycle migration

plot of chunk plot_surface2y

Start minus end

plot of chunk plot_diff

Start minus end

plot of chunk plot_diff2

portland map with terrain
portland map with terrain

Kernel smoothing

What is mean local trip length?

plot of chunk duration

Our old friend, loess

… loess!?!?!

plot of chunk plot_loess

  1. Why?

  2. Suggest another method.

Another method: Kriging

plot of chunk kriging