\[%% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]
fig.dim <- 4
knitr::opts_chunk$set(fig.width=2*fig.dim,
                      fig.height=fig.dim,
                      fig.align='center')

The Problem

In the file mice_and_foxes2.tsv we have (fake) data consisting of the total number of mice and foxes on a certain large island across 100 years. We’re fitting a Lotka-Volterra-type model to these data, in which mice reproduce, but are eaten by foxes, and fox reproduction rate depends on how many mice they eat. Skipping over a lot of details, we’d like to fit the following model of how next year’s numbers of mice (\(M_{t+1}\)) and foxes (\(F_{t+1}\)) depend on the current year’s numbers (\(M_t\) and \(F_t\)): \[\begin{aligned} r_t &= \exp(- \epsilon F_t) \\ M_{t+1} &\sim \Poisson( (\lambda + r_t) M_t ) \\ F_{t+1} &\sim \Poisson( p F_t + \gamma (1 - r_t) M_t ) . \end{aligned}\] In this model, the parameters are:

## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)

We’ll test the model on some simulated data: it will be very helpful in debugging what’s going to know what the true set of parameters is!

sim_params <- list(
                   lambda = 0.85, # prey fecundity per year
                   lsurv = 0.4,   # predator survival prob
                   erate = .001,     # pred-prey encounter rate
                   conv = 0.8,    # prey->pred conversion rate
                   T = 100  # number of time steps
               )

set.seed(123)
MF <- run_sim(c(1200, 1000), sim_params)

matplot(MF$year, MF[,2:3], type='l', xlab='time', ylab='population', lty=2)
legend("topright", lty=1, col=1:2, legend=c("mice", "foxes"))

plot of chunk r do_sim

And, here’s a Stan model - this is just copied from above, with some Normal priors. (The reason for e.g. vM is that M is an integer array, as it must be, to have a Poisson distribution, so when we use M in the expression for the Poisson’s mean we need to convert it to a vector.)

pp_block <- "
data {
    int N;
    int M[N];
    int F[N];
}
parameters {
    real<lower=0> lambda;
    real<lower=0, upper=1> lsurv;
    real<lower=0> erate;
    real<lower=0> conv;
}
model {
    vector[N-1] vM;
    vector[N-1] vF;
    vector[N-1] er;
    vM = to_vector(M[1:(N-1)]);
    vF = to_vector(F[1:(N-1)]);
    er = exp(-erate * vF);
    M[2:N] ~ poisson((lambda + er) .* vM);
    F[2:N] ~ poisson(lsurv * vF + conv * (1-er) .* vM);
    lambda ~ normal(0, 1);
    erate ~ normal(0, 1);
    conv ~ normal(0, 1);
}
"
pp_model <- stan_model(model_code=pp_block)

Now, we’ll fit the model.

pp_fit <- sampling(pp_model,
                   data=list(N=nrow(MF),
                             M=MF$mice,
                             F=MF$foxes),
                   iter=1000, chains=4,
                   control=list(max_treedepth=12))
## Warning: There were 76 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: The largest R-hat is 1.57, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

That’s a bunch of warning about convergence! What happened? Looking at the summary, we see that indeed we have a low effective sample size (n_eff) and high R_hat for all the parameters.

rstan::summary(pp_fit)$summary
##                mean     se_mean          sd         2.5%          25%          50%          75%        97.5%    n_eff     Rhat
## lambda 9.620237e-01  0.04960043  0.07066795 8.181391e-01 9.676252e-01 1.001524e+00 1.003721e+00 1.006842e+00 2.029897 8.574519
## lsurv  5.556919e-01  0.06383016  0.09258394 3.607239e-01 5.240616e-01 6.005355e-01 6.150986e-01 6.414011e-01 2.103872 4.420204
## erate  5.521544e-01  0.22637180  0.62147338 8.906427e-04 1.957627e-03 3.591913e-01 9.304892e-01 2.111065e+00 7.537033 1.216657
## conv   5.355292e-01  0.11168853  0.16054895 4.068456e-01 4.372544e-01 4.516842e-01 5.632790e-01 8.683928e-01 2.066321 5.555813
## lp__   2.336783e+06 52.58528226 75.10352884 2.336734e+06 2.336736e+06 2.336738e+06 2.336817e+06 2.336913e+06 2.039822 9.098177

This suggests the chains aren’t mixing, and, indeed, they aren’t. Below, we see that three of the chains are merrily wantering around a chunk of parameter space… but, chain 3 is somewhere else entirely. Chains 1, 2, and 4 are wandering around the top of a log-likelihood hill (you can tell it’s a local likelihood maximum because they tend to stay there), but chain 3 is in a much nicer part of parameter space, where the log-posterior is 150 units higher there. You can almost hear chain 3 calling “Hey, come up here! The view is pretty nice!” down to the other chains.

stan_trace(pp_fit, pars=c("lambda", "lsurv", "erate", "conv", "lp__"))

plot of chunk r the_trace Furthermore, we know the right answer, and comparing to the parameters we simulated under (lambda = 0.85, lsurv = 0.4, erate = 0.001, conv = 0.001), we see that chain 3 definitely has the right idea. How’d it get there? By including warmup in the trace plots, we see that it just got lucky: at around iteration 250, it happened upon the good part of parameter space.

stan_trace(pp_fit, pars=c("lambda", "lsurv", "erate", "conv", "lp__"), inc_warmup=TRUE)

plot of chunk r the_trace2

But, how to fix it? Often, you want to look for a parameter that isn’t moving. Here, that’s erate, for chain 3. The true value of erate is very small: 0.001, so it’s unsurprising that chain 3 has gone right down by erate=zero and stayed there. But why haven’t the other chains got there? It helps to know some specifics about how Stan explores parameter space. Recall that erate is nonnegative:

    real<lower=0> erate;

It’s awkward to keep a random walk from going over an arbitrary boundary, so under the hood Stan does its exploration in unconstrained space, which it gets to by a transformation. For a parameter that is constrained to be nonnegative, it just takes a logarithm: so, it’s trying out different values of log(erate). By default, it starts at locations chosen uniformly between -2 and 2 in the unconstrained space. So, it’s choosing starting locations for erate between exp(-2) = 0.1353353 and exp(2) = 7.3890561. These are pretty far away from the true value of 0.001.

So, we might be able to fix this by initializing the chains to start closer to zero. But, it’s a general rule of thumb in numerical work that Everything is Nicer when All the Numbers are of Order One. So, instead we might rescale erate by a factor of 1000 - in other words, decide that the natural units for it are “per 1,000 foxes” instead of “per fox”. This requires only a small change to the Stan code:

pp_block2 <- "
data {
    int N;
    int M[N];
    int F[N];
}
parameters {
    real<lower=0> lambda;
    real<lower=0, upper=1> lsurv;
    real<lower=0> erate;
    real<lower=0> conv;
}
model {
    vector[N-1] vM;
    vector[N-1] vF;
    vector[N-1] er;
    vM = to_vector(M[1:(N-1)]);
    vF = to_vector(F[1:(N-1)]);
    er = exp(-erate * vF / 1000);  // RESCALED HERE
    M[2:N] ~ poisson((lambda + er) .* vM);
    F[2:N] ~ poisson(lsurv * vF + conv * (1-er) .* vM);
    lambda ~ normal(0, 1);
    erate ~ normal(0, 1);
    conv ~ normal(0, 1);
}
"
pp_model2 <- stan_model(model_code=pp_block2)

Now, fitting it goes much nicer:

pp_fit2 <- sampling(pp_model2,
                   data=list(N=nrow(MF),
                             M=MF$mice,
                             F=MF$foxes),
                   iter=1000, chains=4,
                   control=list(max_treedepth=12))
## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

And, we’re getting good estimates now. In particular, notice that it’s estimating erate to be right around 1.0 (95% credible interval from 0.85 to 1.08), which is right, in the new units of “per 1,000 foxes”.

rstan::summary(pp_fit2)$summary
##                mean      se_mean         sd         2.5%          25%          50%          75%        97.5%    n_eff     Rhat
## lambda 8.396603e-01 0.0008436349 0.01785099 8.022799e-01 8.284260e-01 8.400517e-01 8.524688e-01 8.721426e-01 447.7292 1.010887
## lsurv  4.007475e-01 0.0012636623 0.02822997 3.433818e-01 3.832642e-01 4.007466e-01 4.200594e-01 4.544374e-01 499.0668 1.011373
## erate  9.623039e-01 0.0027326715 0.05818134 8.482176e-01 9.230616e-01 9.601097e-01 1.001730e+00 1.077766e+00 453.3071 1.011022
## conv   8.082074e-01 0.0022687703 0.04605713 7.245417e-01 7.761239e-01 8.065259e-01 8.370600e-01 9.065631e-01 412.1095 1.014907
## lp__   2.336918e+06 0.0627101381 1.45230927 2.336914e+06 2.336918e+06 2.336919e+06 2.336919e+06 2.336920e+06 536.3431 1.006724

If we were to dig into this more, it’d be nice to know why there’s multiple likelihood maxima: what was it about that other bit of parameter space that looked good to chains 1, 2, and 4? Unfortunately, this sort of thing is a fairly general feature of dynamical systems like this, especially if changing some parameters can make the predicted dynamics zip off to infinity. Also note that in practice we won’t know what the true values of parameters are, but we can often get an idea of the right order of magnitue, which is all we needed.