Peter Ralph
1 October 2020 – Advanced Biological Statistics
The Central Limit Theorem says, roughly, that net effect of the sum of a bunch of small, independent random things can be well-approximated by a Gaussian distribution, almost regardless of the details.
For instance: say \(X_1, X_2, \ldots, X_n\) are independent, random draws with mean \(\mu\) and standard deviation \(\sigma\).
Then, the difference between the “true” mean, \(\mu\), and the sample mean is Gaussian, \[\begin{aligned} \bar x = \frac{1}{n}\sum_{i=1}^n X_i \approx \Normal\left(\mu, \frac{\sigma}{\sqrt{n}}\right) . \end{aligned}\]
Also called the Normal distribution: see previous slide.
Saying that a random number \(Z\) “is Normal”: \[\begin{equation} Z \sim \Normal(\mu, \sigma) \end{equation}\] means that \[\begin{equation} \P\left\{Z \ge \frac{x - \mu}{\sigma}\right\} = \int_x^\infty \frac{1}{\sqrt{2 \pi}} e^{-u^2/2} du . \end{equation}\]
What to remember:
rnorm(10, mean=3, sd=2) # random simulations
pnorm(5, mean=3, sd=2) # probabilities
qnorm(0.975, mean=3, sd=2) # quantiles
Let’s check this, by doing:
find the sample mean of 100 random draws from some distribution
lots of times, and looking at the distribution of those sample means.
Claim: no matter the distribution we sample from, it should look close to Normal.
n <- 100
x <- runif(n)
hist(x, xlab='value', main='sample', col=grey(0.5))
abline(v=mean(x), col='red', lwd=2)
xm <- replicate(1000, mean(runif(n)))
xh <- hist(xm, breaks=40, main=sprintf('mean of %d samples', n), col='red')
plot(xh, main=sprintf('mean of %d samples', n), col='red')
xx <- xh$breaks
polygon(c(xx[-1] - diff(xx)/2, xx[1]),
c(length(xm)* diff(pnorm(xx, mean=0.5, sd=1/sqrt(n*12))), 0),
col=adjustcolor("blue", 0.4))
If \(Y\) and \(Z_1, \ldots, Z_n\) are independent \(\Normal(0, \sigma)\), and \[\begin{equation} X = \frac{Y}{ \sqrt{\frac{1}{n}\sum_{j=1}^n Z_j^2} } \end{equation}\] then \[\begin{equation} X \sim \StudentsT(n) . \end{equation}\]
More usefully, a sample mean divided by its standard error is\(^*\) \(t\) distributed.
This is thanks to the Central Limit Theorem. (\(^*\) usually, approximately)