Peter Ralph
13 November – Advanced Biological Statistics
Suppose now we have data from \(n\) different coins from the same source. We don’t assume they have the same \(\theta\), but don’t know what its distribution is, so try to learn it.
\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\alpha, \beta) \\ \alpha &\sim \Unif(0, 100) \\ \beta &\sim \Unif(0, 100) \end{aligned}\]
Goal: find the posterior distribution of \(\alpha\), \(\beta\).
What is different between:
Pick a value of \(\theta\) at random from \(\Beta(3,1)\).
Flip one thousand \(\theta\)-coins, 500 times each.
Pick one thousand random \(\theta_i \sim \Beta(3,1)\) values.
Flip one thousand coins, one for each \(\theta_i\), 500 times each.
For instance: if you were given datasets from one of these situations, how would you tell which situation it was generated from?
ncoins <- 1000
nflips <- 100
theta1 <- rbeta(1,3,1)
binom_Z <- rbinom(ncoins, size=nflips, prob=theta1)
theta2 <- rbeta(ncoins,3,1)
bb_Z <- rbinom(ncoins, size=nflips, prob=theta2)
hist(binom_Z, breaks=30, col=adjustcolor("blue", 0.5), main='', xlim=c(0,nflips), freq=FALSE, xlab='number of Heads')
hist(bb_Z, breaks=30, col=adjustcolor("red", 0.5), add=TRUE, freq=FALSE)
legend("topleft", fill=adjustcolor(c('blue', 'red'), 0.5), legend=c('one theta', 'many thetas'))
Problem: Find the posterior distribution of \(\alpha\) and \(\beta\), given some data \((Z_1, N_1), \ldots, (Z_k, N_k)\), under the model: \[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &\sim \Beta(\alpha, \beta) \\ \alpha &\sim \Unif(0, 100) \\ \beta &\sim \Unif(0, 100) . \end{aligned}\]
Problem: we don’t have a nice mathematical expression for the posterior distribution.
data {
// stuff you input
}
transformed data {
// stuff that's calculated from the data (just once, at the start)
}
parameters {
// stuff you want to learn about
}
transformed parameters {
// stuff that's calculated from the parameters (at every step)
}
model {
// the action!
}
generated quantities {
// stuff you want computed also along the way
}
Goal: Given:
“find”/ask questions of the posterior distribution on \(\theta\),
\[\begin{aligned} p(\theta \given D) = \frac{ p(D \given \theta) p(\theta) }{ p(D) } . \end{aligned}\]
Problem: usually we can’t write down an expression for this (because of the “\(p(D)\)”).
Solution: we’ll make up a way to draw random samples from it.
Toy example:
(from beta-binomial coin example)
Do we think that \(\theta < 0.5\)?
(before:)
(now:)
i.e., “random-walk-based stochastic integration”
Produces a random sequence of samples \(\theta_1, \theta_2, \ldots, \theta_N\).
At each step, starting at \(\theta_k\):
Propose a new location (nearby?): \(\theta_k'\)
Decide whether to accept it.
Set \(k \leftarrow k+1\); if \(k=N\) then stop.
The magic comes from doing proposals and acceptance so that the \(\theta\)’s are samples from the distribution we want.
Rules are chosen so that \(p(\theta \given D)\) is the stationary distribution (long-run average!) of the random walk (the “Markov chain”).
The chain must mix fast enough so the distribution of visited states converges to \(p(\theta \given D)\).
Because of autocorrelation, \((\theta_1, \theta_2, \ldots, \theta_N)\) are not \(N\) independent samples: they are roughly equivalent to \(N_\text{eff} < N\) independent samples.
For better mixing, acceptance probabilities should not be too high or too low.
Starting several chains far apart can help diagnose failure to mix: Gelman’s \(r\) quantifies how different they are.
In each group, have one walker directed by others:
Start somewhere at “random”, then repeat:
Pick a random \(\{N,S,E,W\}\).
Take a step in that direction,
Take a screenshot of your path after:
Imagine the heatmap of how much time your “walker” has spent in each place in the library.
Question: What distribution does this sample from?
Now:
Pick a random \(\{N,S,E,W\}\).
Take a \(1+\Poisson(5)\) number of steps in that direction,
Again, take a screenshot of your path after:
Does it mix faster?
Would \(1 + \Poisson(50)\) steps be better?
Imagine the walkers are on a hill, and:
Pick a random \(\{N,S,E,W\}\).
If
What would this do?
Thanks to Metropolis-Hastings, if “elevation” is \(p(\theta \given D)\), then setting \(p = p(\theta' \given D) / p(\theta \given D)\) makes the stationary distribution \(p(\theta \given D)\).