Prior distributions and uncertainty

Peter Ralph

9 November – Advanced Biological Statistics

Biased coins

a motivating example

Suppose I have two trick coins:

one (coin A) comes up heads 75% of the time, and
the other (coin B) only 25% of the time.

But, I lost one and I don’t know which! So, I flip it 10 times and get 6 heads. Which is it, and how sure are you?

Possible answers:

Er, probably coin (A)?
Well, \[\begin{aligned} \P\{ \text{6 H in 10 flips} \given \text{coin A} \} &= \binom{10}{6} (.75)^6 (.25)^4 \\ &= 0.146 \end{aligned}\] and \[\begin{aligned} \P\{ \text{6 H in 10 flips} \given \text{coin B} \} &= \binom{10}{6} (.25)^6 (.75)^4 \\ &= 0.016 \end{aligned}\] … so, probably coin (A)?

For a precise answer…

Before flipping, each coin seems equally likely. Then

\[\begin{aligned} \P\{ \text{coin A} \given \text{6 H in 10 flips} \} &= \frac{ \frac{1}{2} \times 0.146 }{ \frac{1}{2} \times 0.146 + \frac{1}{2} \times 0.016 } \\ &= 0.9 \end{aligned}\]

Probability

Bayes’ rule

\[\begin{aligned} \P\{B \given A\} = \frac{\P\{B\} \P\{A \given B\}}{ \P\{A\} } , \end{aligned}\]

where

$B$: possible model
$A$: data
$\P\{B\}$: prior weight on model $B$
$\P\{A \given B\}$: likelihood of data under $B$
$\P\{B\} \P\{A \given B\}$: posterior weight on $B$
$\P\{A\}$: total sum of posterior weights

Breaking it down with more coins

Suppose instead I had 9 coins, with probabilities 10%, 20%, …, 90%; as before I flipped one 10 times and got 6 heads. For each $\theta$ in $0.1, 0.2, \ldots, 0.8, 0.9,$ find \[\begin{aligned} \P\{\text{ coin has prob $\theta$ } \given \text{ 6 H in 10 flips } \} . \end{aligned}\]

Question: which coin(s) is it, and how sure are we? (And, what does it mean when we say how sure we are?)

Uniform prior

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk r the_prior

Weak prior

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk r weak_prior

Strong prior

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk r strong_prior

The likelihood: 6 H in 10 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk r ten_flips

The likelihood: 30 H in 50 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk r fifty_flips

The likelihood: 60 H in 100 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk r 100_flips

The likelihood: 6,000 H in 10,000 flips

prior

$\times$

likelihood

$\propto$

posterior

plot of chunk r ten_thou_flips

A question

What is the right answer to the “coin question”?

Recall: there were nine possible values of $\theta$.

Which coin is it, and how sure are you?

Possible types of answer:

“best guess”
“range of values”
“don’t know”

theta <- (1:9)/10
prior <- rep(1/9, 9)
likelihood <- dbinom(6, size=10, prob=theta)
posterior <- prior*likelihood/sum(prior*likelihood)
names(posterior) <- theta
barplot(posterior, xlab='true prob of heads', main='posterior probability')

plot of chunk r plot_pos

Unknown coins

Motivating example

Now suppose we want to estimate the probability of heads for a coin without knowing the possible values. (or, a disease incidence, or error rate in an experiment, …)

We flip it $n$ times and get $z$ Heads.

The likelihood of this, given the prob-of-heads $\theta$, is: \[p(z \given \theta) = \binom{n}{z}\theta^z (1-\theta)^{n-z} . \]

How to weight the possible $\theta$? Need a flexible set of weighting functions, i.e., prior distributions on $[0,1]$.

Beta distributions.

What $\alpha$ and $\beta$ would we use for a $\Beta(\alpha, \beta)$ prior if:

the coin is probably close to fair.
the disease is probably quite rare.
no idea whatsoever.

plot of chunk r beta_stuff

Beta-Binomial Bayesian analysis

If \[\begin{aligned} P &\sim \text{Beta}(a,b) \\ Z &\sim \text{Binom}(n,P) , \end{aligned}\]

then “miraculously”,

\[\begin{aligned} (P \given Z = z) \sim \text{Beta}(a+z, b+n-z) . \end{aligned}\]

Discuss:

We flip an odd-looking coin 100 times, and get 65 heads. What is it’s true* probability of heads?

What prior to use?
Plot the prior and the posterior.
Is it reasonable that $\theta = 1/2$?
Best guess at $\theta$?
How far off are we, probably?

Tools include: rbeta( )

IN CLASS:

# prior: theta could be anything,
# but a little less likely to be 0% or 100%
alpha <- 1.1
beta <- 1.1

prior_samples <- rbeta(1e6, shape1=alpha, shape2=beta)
hist(prior_samples, breaks=100)

plot of chunk r inclass

# Plot the posterior
z <- 65
n <- 100
post_alpha <- alpha + z
post_beta <- beta + n - z

posterior_samples <- rbeta(1e6, shape1=post_alpha, shape2=post_beta)
hist(posterior_samples, breaks=100)
abline(v=0.5, col='red')
abline(v=post_alpha / (post_alpha + post_beta), col='blue')

plot of chunk r inclass2

It’s not totally improbable that theta is 0.5, but it doesn’t look likely.
Around 64%.
We’re off by around 5-10%, probably.

Reporting uncertainty

How do we communicate results?

If we want a point estimate:

posterior mean,
posterior median, or
maximum a posteriori estimate (“MAP”: highest posterior density).

These all convey “where the posterior distribution is”, more or less.

What about uncertainty?

Credible region

Definition: A 95% credible region is a portion of parameter space having a total of 95% of the posterior probability.

(same with other numbers for “95%”)

Interpretation #1

If we construct a 95% credible interval for $\theta$ for each of many datasets; and the coin in each dataset has $\theta$ drawn independently from the prior, then the true $\theta$ will fall in its credible interval for 95% of the datasets.

Interpretation #2

If we construct a 95% credible interval for $\theta$ with a dataset, and the distribution of the coin’s true $\theta$ across many parallel universes is given by the prior, then the true $\theta$ will be in the credible interval in 95% of those universes.

Interpretation #3

Given my prior beliefs (prior distribution), the posterior distribution is the most rational${}^*$ way to update my beliefs to account for the data.

${}^*$ if you do this many times you will be wrong least often

${}^*$ or you will be wrong in the fewest possible universes

But which credible interval?

Definition: The “95% highest density interval” is the 95% credible interval with the highest posterior probability density at each point.

((back to the coins))