Peter Ralph
Advanced Biological Statistics
Suppose that \(Z_1, \ldots, Z_k\) are independent \(\Normal(0, 1)\). Then
\[ \chi^2 = Z_1^2 + \cdots + Z_k^2 \]
has the chi squared distribution with \(k\) degrees of freedom.
Notes:
\(\chi^2\) is a unitless nonnegative number.
\(\E[\chi^2] = k\).
If instead \(Z_i \sim \Normal(\mu_i, \sigma_i)\), then \(\chi^2 = \sum_{i=1}^k (Z_i - \mu_i)^2 / \sigma_i\).
\(\chi^2 \sim \Gam(k/2, 1/2)\).
If the number of observations in a contingency table with \(r\) rows and \(c\) columns is large, then the chi-squared statistic has, approximately, the chi-squared distribution with \((r-1)\times(c-1)\) degrees of freedom under the hypothesis of independence of rows and columns.
(Asymptotically, i.e., as the number of observations goes to infinity.)
Exercise: Test this by making a plot of (probability the chi-squared statistic is less than 5) against sample size (from 20 to 200) for simulated data in which rows and columns are independent. (*Useful: rmultinom( )
, pchisq( )
.)