Peter Ralph
9 November – Advanced Biological Statistics
Suppose I have two trick coins:
But, I lost one and I don’t know which! So, I flip it 10 times and get 6 heads. Which is it, and how sure are you?
Er, probably coin (A)?
Well, \[\begin{aligned} \P\{ \text{6 H in 10 flips} \given \text{coin A} \} &= \binom{10}{6} (.75)^6 (.25)^4 \\ &= 0.146 \end{aligned}\] and \[\begin{aligned} \P\{ \text{6 H in 10 flips} \given \text{coin B} \} &= \binom{10}{6} (.25)^6 (.75)^4 \\ &= 0.016 \end{aligned}\] … so, probably coin (A)?
For a precise answer…
Before flipping, each coin seems equally likely. Then
\[\begin{aligned} \P\{ \text{coin A} \given \text{6 H in 10 flips} \} &= \frac{ \frac{1}{2} \times 0.146 }{ \frac{1}{2} \times 0.146 + \frac{1}{2} \times 0.016 } \\ &= 0.9 \end{aligned}\]
\[\begin{aligned} \P\{B \given A\} = \frac{\P\{B\} \P\{A \given B\}}{ \P\{A\} } , \end{aligned}\]
where
Suppose instead I had 9 coins, with probabilities 10%, 20%, …, 90%; as before I flipped one 10 times and got 6 heads. For each \(\theta\) in \(0.1, 0.2, \ldots, 0.8, 0.9,\) find \[\begin{aligned} \P\{\text{ coin has prob $\theta$ } \given \text{ 6 H in 10 flips } \} . \end{aligned}\]
Question: which coin(s) is it, and how sure are we? (And, what does it mean when we say how sure we are?)
prior
\(\times\)
likelihood
\(\propto\)
posterior
prior
\(\times\)
likelihood
\(\propto\)
posterior
prior
\(\times\)
likelihood
\(\propto\)
posterior
prior
\(\times\)
likelihood
\(\propto\)
posterior
prior
\(\times\)
likelihood
\(\propto\)
posterior
prior
\(\times\)
likelihood
\(\propto\)
posterior
prior
\(\times\)
likelihood
\(\propto\)
posterior
Recall: there were nine possible values of \(\theta\).
Which coin is it, and how sure are you?
Possible types of answer:
Now suppose we want to estimate the probability of heads for a coin without knowing the possible values. (or, a disease incidence, or error rate in an experiment, …)
We flip it \(n\) times and get \(z\) Heads.
The likelihood of this, given the prob-of-heads \(\theta\), is: \[p(z \given \theta) = \binom{n}{z}\theta^z (1-\theta)^{n-z} . \]
How to weight the possible \(\theta\)? Need a flexible set of weighting functions, i.e., prior distributions on \([0,1]\).
What \(\alpha\) and \(\beta\) would we use for a \(\Beta(\alpha, \beta)\) prior if:
the coin is probably close to fair.
the disease is probably quite rare.
no idea whatsoever.
If \[\begin{aligned} P &\sim \text{Beta}(a,b) \\ Z &\sim \text{Binom}(n,P) , \end{aligned}\]
then “miraculously”,
\[\begin{aligned} (P \given Z = z) \sim \text{Beta}(a+z, b+n-z) . \end{aligned}\]
We flip an odd-looking coin 100 times, and get 65 heads. What is it’s true* probability of heads?
What prior to use?
Plot the prior and the posterior.
Is it reasonable that \(\theta = 1/2\)?
Best guess at \(\theta\)?
How far off are we, probably?
Tools include: rbeta( )
It’s not totally improbable that theta is 0.5, but it doesn’t look likely.
Around 64%.
We’re off by around 5-10%, probably.
If we want a point estimate:
These all convey “where the posterior distribution is”, more or less.
What about uncertainty?
Definition: A 95% credible region is a portion of parameter space having a total of 95% of the posterior probability.
(same with other numbers for “95%”)
If we construct a 95% credible interval for \(\theta\) for each of many datasets; and the coin in each dataset has \(\theta\) drawn independently from the prior, then the true \(\theta\) will fall in its credible interval for 95% of the datasets.
If we construct a 95% credible interval for \(\theta\) with a dataset, and the distribution of the coin’s true \(\theta\) across many parallel universes is given by the prior, then the true \(\theta\) will be in the credible interval in 95% of those universes.
Given my prior beliefs (prior distribution), the posterior distribution is the most rational\({}^*\) way to update my beliefs to account for the data.
\({}^*\) if you do this many times you will be wrong least often
\({}^*\) or you will be wrong in the fewest possible universes
Definition: The “95% highest density interval” is the 95% credible interval with the highest posterior probability density at each point.
((back to the coins))