\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\MVN}{MVN} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

The Cauchy distribution

Peter Ralph

Advanced Biological Statistics

Stochastic minute

If \(X \sim \Cauchy(\text{center}=\mu, \text{scale}=\sigma)\), then \(X\) has probability density \[\begin{aligned} f(x \given \mu, \sigma) = \frac{1}{\pi\left( 1 + \left( \frac{x - \mu}{\sigma} \right)^2 \right)} . \end{aligned}\]

  1. The Cauchy is a good example of a distribution with “heavy tails”: rare, very large values.

  2. \(X\) has a Student’s \(t\) distribution with \(\text{df}=1\).

  3. If \(Z \sim \Normal(0, 1)\) and \(X \sim \Normal(0,1/Z)\) then \(X \sim \Cauchy(0,1)\).

  4. If \(X_1, X_2, \ldots, X_n\) are independent \(\Cauchy(0,1)\) then \(\max(X_1, \ldots, X_n)\) is of size \(n\).

  1. If \(X_1, X_2, \ldots, X_n\) are independent \(\Cauchy(0,1)\) then \[\begin{aligned} \frac{1}{n} \left(X_1 + \cdots + X_n\right) \sim \Cauchy(0,1) . \end{aligned}\]

Wait, what?!?

A single value has the same distribution as the mean of 1,000 of them?

Let’s look:

meanplot <- function (rf, n=1e3, m=100) {
    x <- matrix(rf(n*m), ncol=m)
    layout(t(1:2))
    hist(x[1,][abs(x[1,])<5], breaks=20, freq=FALSE,
         main=sprintf("%d samples", m), xlab='value',
         xlim=c(-5,5))
    hist(colMeans(x)[abs(colMeans(x))<5], breaks=20, freq=FALSE,
         main=sprintf("%d means of %d each", m, n), xlab='value',
         xlim=c(-5,5))
}

\(X \sim \Normal(0,1)\)

meanplot(rnorm)

plot of chunk r normmeans

\(X \sim \Cauchy(0,1)\)

meanplot(rcauchy)

plot of chunk r cauchymeans

Another way to look at it: extreme values

n <- 100
plot(c(cummax(rcauchy(n))), type='l', ylab='max value so far', xlab='number of samples', col='red')
lines(c(cummax(rnorm(n))), col='black')
legend("bottomright", lty=1, col=c('black', 'red'), legend=c('normal', 'cauchy'))

plot of chunk r max_values

Another way to look at it: extreme values

n <- 1000
plot(c(cummax(rcauchy(n))), type='l', ylab='max value so far', xlab='number of samples', col='red')
lines(c(cummax(rnorm(n))), col='black')
legend("bottomright", lty=1, col=c('black', 'red'), legend=c('normal', 'cauchy'))

plot of chunk r max_values2

Another way to look at it: extreme values

n <- 1e6
plot(c(cummax(rcauchy(n))), type='l', ylab='max value so far', xlab='number of samples', col='red')
lines(c(cummax(rnorm(n))), col='black')
legend("bottomright", lty=1, col=c('black', 'red'), legend=c('normal', 'cauchy'))

plot of chunk r max_values3

Exercise: mixture of error rates.

Suppose you are measuring relative metabolic rates of mice in the wild. Because life is complicated, the accuracy of your measurements varies widely. A model of the measured rate, \(R_i\), for a mouse at temperature \(T_i\) is \[\begin{aligned} R_i &\sim \Normal(120 + 0.7 * (T_i - 37), 1/|E_i|) \\ E_i &\sim \Normal(0, 1) . \end{aligned}\]

Simulate 200 measurements from this model, for temperatures between 36 and 38, and try to infer the true slope (0.7).