\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\MVN}{MVN} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Permutation tests

Peter Ralph

Advanced Biological Statistics

Permutation tests

## 
##  Welch Two Sample t-test
## 
## data:  airbnb$price[airbnb$instant_bookable] and airbnb$price[!airbnb$instant_bookable]
## t = 3.6482, df = 5039.8, p-value = 0.0002667
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   4.475555 14.872518
## sample estimates:
## mean of x mean of y 
##  124.6409  114.9668

But, the \(t\) test relies on Normality (a little). Is the distribution of AirBnB prices too “weird”? How can we be sure?

Methods:

  1. Remove the big values and try again.

  2. Use a nonparametric test.

The permutation test

Observation: If there was no meaningful difference in prices between “instant bookable” and not, then randomly shuffling that label won’t change anything.

Strategy:

  1. Shuffle the instant_bookable column.
  2. Compute the difference in means.
  3. Repeat, many times.
  4. Compare: the \(p\)-value is the proportion of “shuffled” values more extreme than observed.

Rightarrow Why is this a \(p\)-value? For what hypothesis?

Shuffle once

fake_is_instant <- sample(airbnb$instant_bookable)
(mean(airbnb$price[fake_is_instant], na.rm=TRUE) - 
                mean(airbnb$price[!fake_is_instant], na.rm=TRUE))
## [1] 2.837541

Many times

real_diff <- (mean(airbnb$price[airbnb$instant_bookable], na.rm=TRUE)
               - mean(airbnb$price[!airbnb$instant_bookable], na.rm=TRUE))
permuted_diffs <- replicate(10000, {
        fake_is_instant <- sample(airbnb$instant_bookable)
        (mean(airbnb$price[fake_is_instant], na.rm=TRUE)
             - mean(airbnb$price[!fake_is_instant], na.rm=TRUE))
   } )
hist(permuted_diffs, xlab="shuffled differences in mean", xlim=range(c(permuted_diffs, real_diff)))
abline(v=real_diff, col='red', lwd=3)

plot of chunk r many_shuf

How surprising was the real value?

mean(abs(permuted_diffs) > abs(real_diff))
## [1] 3e-04

The difference in price between instant bookable and not instant bookable is highly statistically significant (\(p \approx 0.0003\), permutation test).

Our turn

Here’s a nonparametric method to see if omnivores tend to be bigger than herbivores in the PanTHERIA data:

  1. Find pairs of species in the same family for which one is a omnivore and the other an herbivore.

  2. Compute within each pair (some statistic) comparing omnivore to herbivore size, and average these (the test statistic).

  3. Randomly reassign “trophic level” within families many times, recomputing this statistic.

  4. The \(p\)-value is the proportion of shuffled statistics greater than the observed statistic.

In class

source("../Datasets/PanTHERIA/read_pantheria.R")
pantheria <- read_pantheria("../Datasets/PanTHERIA")

First, let’s subset down to only the herbivores and omnivores, with non-missing data.

pantheria <- subset(pantheria,
                    TrophicLevel %in% c("herbivore", "omnivore")
                    &
                    !is.na(AdultBodyMass_g)
) |> droplevels()

We still have 1430 species remaining.

For each family, pick a pair of species:

ratios <- rep(NA, nlevels(pantheria$Family))
names(ratios) <- levels(pantheria$Family)
for (fam in levels(pantheria$Family)) {
    this_fam <- subset(pantheria, Family == fam)
    omnis <- which(this_fam$TrophicLevel == "omnivore")
    herbs <- which(this_fam$TrophicLevel == "herbivore")
    if (length(omnis) > 0 && length(herbs) > 0) {
        herb <- herbs[sample.int(length(herbs), 1)]
        omni <- omnis[sample.int(length(omnis), 1)]
        out <- this_fam$AdultBodyMass_g[herb] / this_fam$AdultBodyMass_g[omni]
    } else {
        out <- NA
    }
    ratios[fam] <- out
}

Do it again, after shuffling

shuffled_mean_ratios <- replicate(100, {
    shuffled_ratios <- rep(NA, nlevels(pantheria$Family))
    names(ratios) <- levels(pantheria$Family)
    for (fam in levels(pantheria$Family)) {
        this_fam <- subset(pantheria, Family == fam)
        # shuffle up trophic level column
        this_fam$TrophicLevel <- sample(this_fam$TrophicLevel)
        omnis <- which(this_fam$TrophicLevel == "omnivore")
        herbs <- which(this_fam$TrophicLevel == "herbivore")
        if (length(omnis) > 0 && length(herbs) > 0) {
            herb <- herbs[sample.int(length(herbs), 1)]
            omni <- omnis[sample.int(length(omnis), 1)]
            out <- this_fam$AdultBodyMass_g[herb] / this_fam$AdultBodyMass_g[omni]
        } else {
            out <- NA
        }
        shuffled_ratios[fam] <- out
    }
    mean(shuffled_ratios, na.rm=TRUE)
})

Look at the results

observed_mean_ratio <- mean(ratios, na.rm=TRUE)
p_val <- mean(shuffled_mean_ratios > observed_mean_ratio)
hist(shuffled_mean_ratios)
abline(v=observed_mean_ratio, col='red')

plot of chunk r hist

Conclusion

On average the herbivore of the pair was 3.2959199 times bigger than the omnivore; a mean ratio this big occurred 0.08 of the time in the permuted datasets. So, this is not strong evidence that herbivores tend to be bigger than omnivores (p = 0.08, permutation test).

However, we only did one random draw of the pairs to get the observed values.

Lots of draws of pairs, unshuffled

observed_mean_ratio_vector <- replicate(100, {
    ratios <- rep(NA, nlevels(pantheria$Family))
    names(ratios) <- levels(pantheria$Family)
    for (fam in levels(pantheria$Family)) {
        this_fam <- subset(pantheria, Family == fam)
        omnis <- which(this_fam$TrophicLevel == "omnivore")
        herbs <- which(this_fam$TrophicLevel == "herbivore")
        if (length(omnis) > 0 && length(herbs) > 0) {
            herb <- herbs[sample.int(length(herbs), 1)]
            omni <- omnis[sample.int(length(omnis), 1)]
            out <- this_fam$AdultBodyMass_g[herb] / this_fam$AdultBodyMass_g[omni]
        } else {
            out <- NA
        }
        ratios[fam] <- out
    }
    mean(ratios, na.rm=TRUE)
})

Look at results again

xh <- hist(c(observed_mean_ratio_vector, shuffled_mean_ratios), plot=FALSE)
hist(shuffled_mean_ratios, breaks=xh$breaks,  col=adjustcolor('red', 0.5))
hist(observed_mean_ratio_vector, breaks=xh$breaks, add=TRUE, col=adjustcolor("grey", 0.5))
legend("topright", col=c("white", "red"),
       legend=c("observed", "shuffled")
)

plot of chunk r plot2

Conclusion? Well, a t-test says:

t.test(observed_mean_ratio_vector, shuffled_mean_ratios)
## 
##  Welch Two Sample t-test
## 
## data:  observed_mean_ratio_vector and shuffled_mean_ratios
## t = 4.4179, df = 194.6, p-value = 1.653e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3398443 0.8879671
## sample estimates:
## mean of x mean of y 
##  2.649445  2.035539

… but we need to think about this more.