Adding levels of randomness

Peter Ralph

17 November 2020 – Advanced Biological Statistics

Hierarchical coins

Motivating problem: more coins

Suppose now we have data from \(n\) different coins from the same source. We don’t assume they have the same \(\theta\), but don’t know what its distribution is, so try to learn it.

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\alpha, \beta) \\ \alpha &\sim \Unif(0, 100) \\ \beta &\sim \Unif(0, 100) \end{aligned}\]

Goal: find the posterior distribution of \(\alpha\), \(\beta\).

Problem: we don’t have a nice mathematical expression for this posterior distribution.

MC Stan

“Quick and easy” MCMC: Stan

The skeletal Stan program

data {
    // stuff you input
}
transformed data {
    // stuff that's calculated from the data (just once, at the start)
}
parameters {
    // stuff you want to learn the posterior distribution of
}
transformed parameters {
    // stuff that's calculated from the parameters (at every step)
}
model {
    // the action!
}
generated quantities {
    // stuff you want computed also along the way
}

How to do everything: see the user’s manual.

Beta-Binomial with Stan

From before:

We’ve flipped a coin 10 times and got 6 Heads. We think the coin is close to fair, so put a \(\Beta(20,20)\) prior on it’s probability of heads, and want the posterior distribution.

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\] Sample from \[\theta \given Z = 6\]

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

data {
    // stuff you input
}
parameters {
    // stuff you want to learn 
    // the posterior distribution of
}
model {
    // the action!
}

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

data {
    int N;   // number of flips
    int Z;   // number of heads
}
parameters {
    // stuff you want to learn 
    // the posterior distribution of
}
model {
    // the action!
}

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

data {
    int N;   // number of flips
    int Z;   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta;  
}
model {
    // the action!
}

\[\begin{aligned} Z &\sim \Binom(10, \theta) \\ \theta &\sim \Beta(20, 20) \end{aligned}\]

data {
    int N;   // number of flips
    int Z;   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta;
}
model {
    Z ~ binomial(N, theta);
    theta ~ beta(20, 20);
}

Compiling Stan model, in R

library(rstan)
stan_block <- "
data {
    int N;   // number of flips
    int Z;   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta;
}
model {
    Z ~ binomial(N, theta);
    theta ~ beta(20, 20);
}
"
bb_model <- stan_model(
               model_code=stan_block)

Optimization: maximum likelihood

With a uniform prior, the “maximum posterior” parameter values are also the maximum likelihood values.

> help(optimizing)

optimizing                package:rstan                R Documentation

Obtain a point estimate by maximizing the joint posterior

Description:

     Obtain a point estimate by maximizing the joint posterior from the
     model defined by class 'stanmodel'.

Usage:

     ## S4 method for signature 'stanmodel'
     optimizing(object, data = list(),

Maximum posterior

library(rstan)
(fit <- optimizing(bb_model,  # stan model from above
                   data=list(N=10, Z=6)))

## $par
##     theta 
## 0.5208333 
## 
## $value
## [1] -33.22939
## 
## $return_code
## [1] 0
## 
## $theta_tilde
##          theta
## [1,] 0.5208333

The answer!

The maximum a posteriori probability estimate (MAP) is 0.5208333.

post_fun <- function (p) {
    n <- 10; z <- 6; a <- b <- 20
    lik <- dbinom(z, size=n, prob=p)
    prior <- dbeta(p, a, b)
    return( prior * lik )
}
curve(post_fun, 0, 1, xlab=expression(theta), ylab='posterior prob')
points(fit$par, post_fun(fit$par), col='red', pch=20)

Note: Since the prior was uniform, then this is the maximum likelihood estimate (MLE) also.

plot of chunk r stan_check

Sampling from the posterior distribution

library(rstan)
fit <- sampling(bb_model,  # compiled stan model from above
                data=list(N=10, Z=6),
                chains=3, iter=10000)

lp__ is the log posterior density. Note n_eff and Rhat.

print(fit)

## Inference for Stan model: 5f3115ac9593cded393e67d2d0e3ce84.
## 3 chains, each with iter=10000; warmup=5000; thin=1; 
## post-warmup draws per chain=5000, total post-warmup draws=15000.
## 
##         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
## theta   0.52    0.00 0.07   0.38   0.47   0.52   0.57   0.66  6136    1
## lp__  -35.12    0.01 0.71 -37.17 -35.28 -34.85 -34.67 -34.62  5608    1
## 
## Samples were drawn using NUTS(diag_e) at Mon Nov 16 09:44:31 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Fuzzy caterpillars are good.

stan_trace(fit)

plot of chunk r trace_rstan

Stan uses ggplot2.

stan_hist(fit, bins=20) + xlim(0,1)

## Warning: Removed 2 rows containing missing values (geom_bar).

plot of chunk r plot_rstan

The samples:

sampled_theta <- extract(fit, permuted=FALSE, inc_warmup=TRUE)

plot of chunk r the_samples

What’s the posterior probability that \(\theta < 0.5\)?

samples <- extract(fit)
mean(samples$theta < 0.5)

## [1] 0.384

# compare to analytic solution
pbeta(0.5, shape1=20+6, shape2=20+4)

## [1] 0.3877248

Hierarchical Coins

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\alpha, \beta) \\ \alpha &\sim \Unif(0, 100) \\ \beta &\sim \Unif(0, 100) \end{aligned}\]

data {
    // the data (input)
}
parameters {
    // the parameters (output)
}
model {
    // how they are related
}

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\alpha, \beta) \\ \alpha &\sim \Unif(0, 100) \\ \beta &\sim \Unif(0, 100) \end{aligned}\]

data {
    int n;      // number of coins
    int N[n];   // number of flips
    int Z[n];   // number of heads
}
parameters {
    // the parameters (output)
}
model {
    // how they are related
}

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\alpha, \beta) \\ \alpha &\sim \Unif(0, 100) \\ \beta &\sim \Unif(0, 100) \end{aligned}\]

data {
    int n;      // number of coins
    int N[n];   // number of flips
    int Z[n];   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta[n];
    real<lower=0,upper=100> alpha;
    real<lower=0,upper=100> beta;
}
model {
    // how they are related
}

\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\alpha, \beta) \\ \alpha &\sim \Unif(0, 100) \\ \beta &\sim \Unif(0, 100) \end{aligned}\]

data {
    int n;      // number of coins
    int N[n];   // number of flips
    int Z[n];   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta[n];
    real<lower=0, upper=100> alpha;
    real<lower=0, upper=100> beta;
}
model {
    Z ~ binomial(N, theta);
    theta ~ beta(alpha, beta);
    // uniform priors "go without saying"
    // alpha ~ uniform(0, 100);
    // beta ~ uniform(0, 100);
}

Exercise

Data:

set.seed(23)
ncoins <- 100
true_theta <- rbeta(ncoins, 20, 50)
N <- rep(50, ncoins)
Z <- rbinom(ncoins, size=N, prob=true_theta)

Find the posterior distribution on alpha and beta: check convergence with print() and stan_trace(), then plot using stan_hist(), pairs(), and/or stan_scat() (with e.g., pars=c("alpha", "beta", "theta[1]", "theta[2]")).

data {
    int n;      // number of coins
    int N[n];   // number of flips
    int Z[n];   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta[n];
    real<lower=0,upper=100> alpha;
    real<lower=0,upper=100> beta;
}
model {
    Z ~ binomial(N, theta);
    theta ~ beta(alpha, beta);
    // uniform priors 'go without saying'
    // alpha ~ uniform(0, 100);
    // beta ~ uniform(0, 100);
}

In class

Write down and compile the model:

bag_block <- "
data {
    int n;      // number of coins
    int N[n];   // number of flips
    int Z[n];   // number of heads
}
parameters {
    // probability of heads
    real<lower=0,upper=1> theta[n];
    real<lower=0,upper=100> alpha;
    real<lower=0,upper=100> beta;
}
model {
    Z ~ binomial(N, theta);
    theta ~ beta(alpha, beta);
    // uniform priors 'go without saying'
    // alpha ~ uniform(0, 100);
    // beta ~ uniform(0, 100);
}
"
bag_model <- stan_model(model_code=bag_block)

Sample from the model:

bag_fit <- sampling(bag_model,
                    data=list(n=ncoins,
                              N=N,
                              Z=Z),
                    chains=3, iter=100)

## Error in .local(object, ...): object 'ncoins' not found

Sample more from the model:

bag_fit <- sampling(bag_model,
                    data=list(n=ncoins,
                              N=N,
                              Z=Z),
                    chains=3, iter=10000)

## Error in .local(object, ...): object 'ncoins' not found

Check for convergence: n_eff are all pretty big, and Rhat are all equal to 1!

print(bag_fit)

## Error in print(bag_fit): object 'bag_fit' not found

Check diagnostic plots: – whoops, the upper bound of \(\beta \le 100\) is constraining things; we should probably increase that.

stan_trace(bag_fit, pars=c("theta[1]", "theta[2]", "alpha", "beta"))

## Error in is.stanreg(object): object 'bag_fit' not found

More plots: Hm, the posteriors of \(\alpha\) and \(\beta\) are correlated.

pairs(bag_fit, pars=c("theta[1]", "theta[2]", "alpha", "beta"))

## Error in pairs(bag_fit, pars = c("theta[1]", "theta[2]", "alpha", "beta")): object 'bag_fit' not found

Results:

stan_hist(bag_fit, pars=c("alpha", "beta"))

## Error in is.stanreg(object): object 'bag_fit' not found

What about the mean \(\theta\) in the bag? The mean of a Beta distribution is \(\alpha / (\alpha + \beta)\), so let’s get the posterior distribution of that:

samples <- extract(bag_fit)

## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'extract': object 'bag_fit' not found

post_mean <- samples$alpha / (samples$alpha + samples$beta)
hist(post_mean, main='posterior distribution of mu', xlab=expression(alpha / (alpha + beta)))

## Error in hist.default(post_mean, main = "posterior distribution of mu", : invalid number of 'breaks'

The posterior mean of \(\mu = \alpha / (\alpha + \beta)\) is NaN, and a 95% credible interval is from NA to NA.

Some notes on Stan

Variable types in Stan:

int x;       // an integer
int y[10];   // ten integers
real z;      // a number
real z[2,5]; // a 2x5 array of numbers

vector[10] u;      // length 10 vector
matrix[10,10] v;   // 10x10 matrix
vector[10] w[10];  // ten length 10 vectors

don’t forget the ;
make sure R types match
read the error messages

Steps in fitting a model

Write model as a text block.
Compile the model with stan_model( ).
Sample from the model with sampling( ).
Read warnings and errors.
Check convergence diagnostics (Rhat, stan_trace( ), pairs( ), etc).
Summarize result (credible intervals, stan_plot( ), stan_hist( ), etc).