The Cauchy distribution

Peter Ralph

30 November 2020 – Advanced Biological Statistics

Stochastic minute

If \(X \sim \Cauchy(\text{center}=\mu, \text{scale}=\sigma)\), then \(X\) has probability density \[\begin{aligned} f(x \given \mu, \sigma) = \frac{1}{\pi\left( 1 + \left( \frac{x - \mu}{\sigma} \right)^2 \right)} . \end{aligned}\]

The Cauchy is a good example of a distribution with “heavy tails”: rare, very large values.
\(X\) has a Student’s \(t\) distribution with \(\text{df}=1\).
If \(Z \sim \Normal(0, 1)\) and \(X \sim \Normal(0,1/Z)\) then \(X \sim \Cauchy(0,1)\).
If \(X_1, X_2, \ldots, X_n\) are independent \(\Cauchy(0,1)\) then \(\max(X_1, \ldots, X_n)\) is of size \(n\).

If \(X_1, X_2, \ldots, X_n\) are independent \(\Cauchy(0,1)\) then \[\begin{aligned} \frac{1}{n} \left(X_1 + \cdots + X_n\right) \sim \Cauchy(0,1) . \end{aligned}\]

Wait, what?!?

A single value has the same distribution as the mean of 1,000 of them?

Let’s look:

meanplot <- function (rf, n=1e3, m=100) {
    x <- matrix(rf(n*m), ncol=m)
    layout(t(1:2))
    hist(x[1,][abs(x[1,])<5], breaks=20, freq=FALSE,
         main=sprintf("%d samples", m), xlab='value',
         xlim=c(-5,5))
    hist(colMeans(x)[abs(colMeans(x))<5], breaks=20, freq=FALSE,
         main=sprintf("%d means of %d each", m, n), xlab='value',
         xlim=c(-5,5))
}

\(X \sim \Normal(0,1)\)

meanplot(rnorm)

plot of chunk r normmeans

\(X \sim \Cauchy(0,1)\)

meanplot(rcauchy)

plot of chunk r cauchymeans

Another way to look at it: extreme values

n <- 100
plot(c(cummax(rcauchy(n))), type='l', ylab='max value so far', xlab='number of samples', col='red')
lines(c(cummax(rnorm(n))), col='black')
legend("bottomright", lty=1, col=c('black', 'red'), legend=c('normal', 'cauchy'))

plot of chunk r max_values

Another way to look at it: extreme values

n <- 1000
plot(c(cummax(rcauchy(n))), type='l', ylab='max value so far', xlab='number of samples', col='red')
lines(c(cummax(rnorm(n))), col='black')
legend("bottomright", lty=1, col=c('black', 'red'), legend=c('normal', 'cauchy'))

plot of chunk r max_values2

Another way to look at it: extreme values

n <- 1e6
plot(c(cummax(rcauchy(n))), type='l', ylab='max value so far', xlab='number of samples', col='red')
lines(c(cummax(rnorm(n))), col='black')
legend("bottomright", lty=1, col=c('black', 'red'), legend=c('normal', 'cauchy'))

plot of chunk r max_values3

Exercise: mixture of error rates.

Suppose you are measuring relative metabolic rates of mice in the wild. Because life is complicated, the accuracy of your measurements varies widely. A model of the measured rate, \(R_i\), for a mouse at temperature \(T_i\) is \[\begin{aligned} R_i &\sim \Normal(120 + 0.7 * (T_i - 37), 1/|E_i|) \\ E_i &\sim \Normal(0, 1) . \end{aligned}\]

Simulate 200 measurements from this model, for temperatures between 36 and 38, and try to infer the true slope (0.7).

IN CLASS

n <- 200
mice <- data.frame(
       T = runif(n, 36, 38) )
mice$E <- rnorm(n)
mice$R <- rnorm(n,
                mean=120 + 0.7 * (mice$T - 37),
                sd=1/abs(mice$E))
plot(R ~ T, data=mice, xlab='temperature', ylab='metabolic rate')
abline(120 - 0.7 * 37, 0.7, col='red')

plot of chunk r sim_class

# zoom in
plot(R ~ T, data=mice, xlab='temperature',
     ylab='metabolic rate',
     ylim=c(110, 130))
abline(120 - 0.7 * 37, 0.7, col='red')

plot of chunk r sim_class

summary(lm(R ~ T, data=mice))

## 
## Call:
## lm(formula = R ~ T, data = mice)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -534.73   -0.16    1.62    3.10  214.98 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.7282   188.4806  -0.004    0.997
## T             3.2152     5.0926   0.631    0.529
## 
## Residual standard error: 41.3 on 198 degrees of freedom
## Multiple R-squared:  0.002009,   Adjusted R-squared:  -0.003031 
## F-statistic: 0.3986 on 1 and 198 DF,  p-value: 0.5285