Peter Ralph
Advanced Biological Statistics
The \(t\) statistic computed from a collection of \(n\) numbers is the sample mean divided by the estimated standard error of the mean, which is the sample SD divided by \(\sqrt{n}\).
If \(x_1, \ldots, x_n\) are numbers, then \[\begin{aligned} \text{(sample mean)} \qquad \bar x &= \frac{1}{n}\sum_{i=1}^n x_i \\ \text{(sample SD)} \qquad s &= \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i - \bar x)^2} \end{aligned}\] so \[\begin{equation} t(x) = \frac{\bar x}{s / \sqrt{n}} . \end{equation}\]
## t
## 1.318919 1.318919
Fact: If \(X_1, \ldots, X_n\) are independent random samples from a distribution with mean \(\mu\), then \[\begin{equation} t(X - \mu) = \frac{\bar x - \mu}{s/\sqrt{n}} \approx \StudentsT(n-2) , \end{equation}\] as long as \(n\) is not too small and the distribution isn’t too wierd.
Let’s check this, by doing:
find the sample \(t\) score of 100 random draws from some distribution
lots of times, and looking at the distribution of those \(t\) scores.
Claim: no matter\({}^*\) the distribution we sample from, the sampling distribution of the \(t\) statistics should look close to the \(t\) distribution.
Do this again (use my code) except using
x <- rexp(n) - 1
instead of 2 * runif(n) - 1
.