\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\MVN}{MVN} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

The chi-squared distribution

Peter Ralph

Advanced Biological Statistics

Stochastic minute

The chi-squared distribution

Suppose that \(Z_1, \ldots, Z_k\) are independent \(\Normal(0, 1)\). Then

\[ \chi^2 = Z_1^2 + \cdots + Z_k^2 \]

has the chi squared distribution with \(k\) degrees of freedom.

Notes:

  1. \(\chi^2\) is a unitless nonnegative number.

  2. \(\E[\chi^2] = k\).

  3. If instead \(Z_i \sim \Normal(\mu_i, \sigma_i)\), then \(\chi^2 = \sum_{i=1}^k (Z_i - \mu_i)^2 / \sigma_i\).

  4. \(\chi^2 \sim \Gam(k/2, 1/2)\).

Asymptotics

If the number of observations in a contingency table with \(r\) rows and \(c\) columns is large, then the chi-squared statistic has, approximately, the chi-squared distribution with \((r-1)\times(c-1)\) degrees of freedom under the hypothesis of independence of rows and columns.

(Asymptotically, i.e., as the number of observations goes to infinity.)

Exercise: Test this by making a plot of (probability the chi-squared statistic is less than 5) against sample size (from 20 to 200) for simulated data in which rows and columns are independent. (*Useful: rmultinom( ), pchisq( ).)

// reveal.js plugins