Peter Ralph
23 November 2020 – Advanced Biological Statistics
Let’s write down a procedure to simulate data that looks like the baseball data.
batting$post_mean <- colMeans(param_samples$theta)
batting$fake_hits <- rbinom(nrow(batting),
size=batting$AtBats,
prob=batting$post_mean)
Option 1: By simulating draws from the posterior mean on \(\theta\), we can check that our model is accurately describing the data. Here is the real data against data simulated under the posterior mean \(\theta\) values. They look similar, which is good.
But if we want to know if we’re accurately estimating \(\mu\) and \(\kappa\), then we have to start with them, and simulate \(\theta\).
post_mean_mu <- colMeans(param_samples$mu)
post_mean_kappa <- colMeans(param_samples$kappa)
names(post_mean_mu) <- names(post_mean_kappa) <- levels(batting$PriPos)
batting$sim_theta <- rbeta(nrow(batting),
shape1=post_mean_mu[batting$PriPos] *
post_mean_kappa[batting$PriPos],
shape2=(1-post_mean_mu[batting$PriPos]) *
post_mean_kappa[batting$PriPos])
batting$sim_theta_hits <- rbinom(nrow(batting),
size=batting$AtBats,
prob=batting$sim_theta)
Fit the model to the simulated data:
sim_fit <- sampling(pos_model, chains=3, iter=1000, control=list(max_treedepth=13),
data=list(N=nrow(batting),
hits=batting$sim_theta_hits,
at_bats=batting$AtBats,
npos=nlevels(batting$PriPos),
position=as.numeric(batting$PriPos)))
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
Can we estimate \(\mu\) and \(\kappa\)?
Do the credible intervals contain the true value?
(i.e., Is the method “well-calibrated”?)
They usually should.
How wide are credible intervals, typically?
This is (one kind of) statistical power.
Suppose that I have a large pot of coins that are all similar to each other. I flip each one ten times, and record the number of Heads. What is each coin’s probability of Heads?
Treated separately, we would be very uncertain about each coin.
Together, they should tell us very accurately what are likely values of \(\theta\).
This information can improve the estimate of each separate \(\theta\).
The more similar the coins are, the more information we gain.
By shrinking each estimate towards their common mean, we hope to gain power.
Some players were at bat very few times. How does the information about their position affect our inference about their batting averages?
batting$post_med <- colMedians(param_samples$theta)
batting$post_Q1 <- colQuantiles(param_samples$theta, probs=0.25)
batting$post_Q3 <- colQuantiles(param_samples$theta, probs=0.75)
pos_means <- colMeans(param_samples$mu)
names(pos_means) <- levels(batting$PriPos)
pos_means
## 1st Base 2nd Base 3rd Base Catcher Center Field Left Field Pitcher Right Field Shortstop
## 0.2526893 0.2489027 0.2553305 0.2388856 0.2579088 0.2495208 0.1289218 0.2586869 0.2478177
Pitchers had posterior mean \(\mu\) of 0.1289218
with(subset(batting[order(batting$post_med),], PriPos=="Pitcher"), {
plot(Hits / AtBats, main="Pitchers", xaxt='n', ylim=c(0, 0.4),
xlab='player', ylab="posterior median theta");
segments(x0=seq_along(Hits), y0=post_Q1, y1=post_Q3,
col=adjustcolor('red',0.5));
points(post_med, pch=20) })
abline(h=pos_means["Pitcher"], lwd=2, col=adjustcolor("blue", 0.5))
legend("topleft", pch=c(1,20), legend=c("observed", "estimate"))
Catchers had posterior mean \(\mu\) of 0.2388856:
with(subset(batting[order(batting$post_med),], PriPos=="Catcher"), {
plot(Hits / AtBats, main="Catchers", xaxt='n', ylim=c(0, 0.4),
xlab='player', ylab="posterior median theta");
segments(x0=seq_along(Hits), y0=post_Q1, y1=post_Q3,
col=adjustcolor('red',0.5));
points(post_med, pch=20) })
abline(h=pos_means["Catcher"], lwd=2, col=adjustcolor("blue", 0.5))
legend("topleft", pch=c(1,20), legend=c("observed", "estimate"))
With simulated data, compare median absolute error for
posterior mean \(\theta_i\)
empirical batting average
How’d we do? Let’s look at the true values of \(\theta\) (we know because we generated them) against the posterior means. Red lines are 95% credible intervals.
Did we do better?
The mean absolute difference between the truth and
Using information about groups reduced our error by a factor of 4!