Peter Ralph
30 November 2020 – Advanced Biological Statistics
If \(X \sim \Cauchy(\text{center}=\mu, \text{scale}=\sigma)\), then \(X\) has probability density \[\begin{aligned} f(x \given \mu, \sigma) = \frac{1}{\pi\left( 1 + \left( \frac{x - \mu}{\sigma} \right)^2 \right)} . \end{aligned}\]
The Cauchy is a good example of a distribution with “heavy tails”: rare, very large values.
\(X\) has a Student’s \(t\) distribution with \(\text{df}=1\).
If \(Z \sim \Normal(0, 1)\) and \(X \sim \Normal(0,1/Z)\) then \(X \sim \Cauchy(0,1)\).
If \(X_1, X_2, \ldots, X_n\) are independent \(\Cauchy(0,1)\) then \(\max(X_1, \ldots, X_n)\) is of size \(n\).
Wait, what?!?
A single value has the same distribution as the mean of 1,000 of them?
Let’s look:
meanplot <- function (rf, n=1e3, m=100) {
x <- matrix(rf(n*m), ncol=m)
layout(t(1:2))
hist(x[1,][abs(x[1,])<5], breaks=20, freq=FALSE,
main=sprintf("%d samples", m), xlab='value',
xlim=c(-5,5))
hist(colMeans(x)[abs(colMeans(x))<5], breaks=20, freq=FALSE,
main=sprintf("%d means of %d each", m, n), xlab='value',
xlim=c(-5,5))
}
\(X \sim \Normal(0,1)\)
\(X \sim \Cauchy(0,1)\)
Suppose you are measuring relative metabolic rates of mice in the wild. Because life is complicated, the accuracy of your measurements varies widely. A model of the measured rate, \(R_i\), for a mouse at temperature \(T_i\) is \[\begin{aligned} R_i &\sim \Normal(120 + 0.7 * (T_i - 37), 1/|E_i|) \\ E_i &\sim \Normal(0, 1) . \end{aligned}\]
Simulate 200 measurements from this model, for temperatures between 36 and 38, and try to infer the true slope (0.7
).
n <- 200
mice <- data.frame(
T = runif(n, 36, 38) )
mice$E <- rnorm(n)
mice$R <- rnorm(n,
mean=120 + 0.7 * (mice$T - 37),
sd=1/abs(mice$E))
plot(R ~ T, data=mice, xlab='temperature', ylab='metabolic rate')
abline(120 - 0.7 * 37, 0.7, col='red')
# zoom in
plot(R ~ T, data=mice, xlab='temperature',
ylab='metabolic rate',
ylim=c(110, 130))
abline(120 - 0.7 * 37, 0.7, col='red')
##
## Call:
## lm(formula = R ~ T, data = mice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -534.73 -0.16 1.62 3.10 214.98
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.7282 188.4806 -0.004 0.997
## T 3.2152 5.0926 0.631 0.529
##
## Residual standard error: 41.3 on 198 degrees of freedom
## Multiple R-squared: 0.002009, Adjusted R-squared: -0.003031
## F-statistic: 0.3986 on 1 and 198 DF, p-value: 0.5285