Peter Ralph
28 September 2021 – Advanced Biological Statistics
The Central Limit Theorem says, roughly, that net effect of the sum of a bunch of small, independent random things can be well-approximated by a Gaussian distribution, almost regardless of the details.
For instance: say \(X_1, X_2, \ldots, X_n\) are independent, random draws with mean \(\mu\) and standard deviation \(\sigma\).
Then, the sample mean is Gaussian, centered on the true mean: \[\begin{aligned} \bar x = \frac{1}{n}\sum_{i=1}^n X_i \approx \Normal\left(\mu, \frac{\sigma}{\sqrt{n}}\right) . \end{aligned}\]
Also called the Normal distribution: see previous slide.
Saying that a random number \(Z\) “is Normal”: \[\begin{equation} Z \sim \Normal(\mu, \sigma) \end{equation}\] means that \[\begin{equation} \P\left\{Z \ge \mu + x \sigma \right\} = \int_x^\infty \frac{1}{\sqrt{2 \pi}} e^{-u^2/2} du . \end{equation}\]
What to remember:
rnorm(10, mean=3, sd=2) # random simulations
pnorm(5, mean=3, sd=2) # probabilities
qnorm(0.975, mean=3, sd=2) # quantiles
Let’s check this, by doing:
find the sample mean of 100 random draws from some distribution
lots of times, and looking at the distribution of those sample means.
Claim: no matter the distribution we sample from, it should look close to Normal.
If \(Y\) and \(Z_1, \ldots, Z_n\) are independent \(\Normal(0, \sigma)\), and \[\begin{equation} X = \frac{Y}{ \sqrt{\frac{1}{n}\sum_{j=1}^n Z_j^2} } \end{equation}\] then \[\begin{equation} X \sim \StudentsT(n) . \end{equation}\]
More usefully, a sample mean divided by its standard error is\(^*\) \(t\) distributed.
This is thanks to the Central Limit Theorem. (\(^*\) usually, approximately)
Simulate at least 1,000 random draws from each of these distributions, and make histograms:
Which ones give you integers? positive numbers? numbers in a bounded region?