Peter Ralph
Advanced Biological Statistics
Data from: Donihue, C.M., Herrel, A., Fabre, AC. et al. Hurricane-induced selection on the morphology of an island lizard. Nature 560, 88–91 (2018).
Anolis scriptus morphology and performance from before and after hurricanes
Lizard morphology data collected from two visits to Pine Cay and Water Cay in
the Turks and Caicos Islands. Anolis scriptus lizards were surveyed on both
islands four days before hurricane Irma and again, six weeks later, after the
islands had been hit by Hurricanes Irma and Maria. We measured morphology and
performance and found significant differences in the "before" and "after"
hurricane lizard population morphology. All linear measurements are in MM,
surface area measurements are in MM2 and force in N. Counts correspond to the
number of lamellar scales on the forelimb and hind limb toepads.
Here’s the data: dataset; README. First we’ll read it in, reorder the levels of the Hurricane factor (so that “before” comes before “after”), change units on SVL to centimeters so it’s on the same scale as the other variables (useful below), and drop unused variables.
lizards <- read.csv(
"../Datasets/Hurricane_lizards/lizards.csv",
header=TRUE, stringsAsFactors=TRUE)
lizards$Hurricane <- factor(lizards$Hurricane, levels=c("Before", "After"))
lizards$SVL_cm <- lizards$SVL / 10
varnames <- c("SVL_cm", "Femur", "Tibia", "Metatarsal", "LongestToe", "Humerus",
"Radius", "Metacarpal", "LongestFinger", "FingerCount", "ToeCount",
"MeanFingerArea", "MeanToeArea")
lizards <- lizards[, c(c("Hurricane", "Origin", "Sex"), varnames)]
head(lizards)
## Hurricane Origin Sex SVL_cm Femur Tibia Metatarsal LongestToe Humerus Radius Metacarpal LongestFinger FingerCount ToeCount MeanFingerArea MeanToeArea
## 1 After Pine Cay Male 4.869 10.39 11.87 7.52 7.43 8.66 7.99 2.22 3.19 10 12 1.3327 2.433
## 2 After Pine Cay Female 4.031 8.66 9.79 6.18 6.20 8.01 6.51 2.38 3.55 10 13 0.9613 1.518
## 3 After Pine Cay Male 5.830 12.87 14.76 9.45 9.58 11.72 9.54 3.54 5.09 14 15 2.6313 4.098
## 4 After Pine Cay Female 4.315 8.55 10.29 6.60 6.26 7.43 6.60 2.79 3.55 11 12 1.1777 1.879
## 5 After Pine Cay Female 4.551 10.26 11.02 6.89 7.02 7.71 7.25 2.52 3.37 11 13 1.3843 2.530
## 6 After Pine Cay Female 4.697 10.02 10.78 6.85 7.18 8.45 7.15 2.39 3.26 12 14 1.4260 2.036
Here’s the sample sizes:
## , , Hurricane = Before
##
## Origin
## Sex Pine Cay Water Cay
## Female 15 18
## Male 18 20
##
## , , Hurricane = After
##
## Origin
## Sex Pine Cay Water Cay
## Female 17 22
## Male 29 25
## Hurricane Origin Sex SVL_cm Femur Tibia Metatarsal LongestToe Humerus Radius Metacarpal LongestFinger FingerCount ToeCount MeanFingerArea MeanToeArea
## 1 After Pine Cay Male 4.869 10.39 11.87 7.52 7.43 8.66 7.99 2.22 3.19 10 12 1.3327 2.433
## 2 After Pine Cay Female 4.031 8.66 9.79 6.18 6.20 8.01 6.51 2.38 3.55 10 13 0.9613 1.518
## 3 After Pine Cay Male 5.830 12.87 14.76 9.45 9.58 11.72 9.54 3.54 5.09 14 15 2.6313 4.098
## 4 After Pine Cay Female 4.315 8.55 10.29 6.60 6.26 7.43 6.60 2.79 3.55 11 12 1.1777 1.879
## 5 After Pine Cay Female 4.551 10.26 11.02 6.89 7.02 7.71 7.25 2.52 3.37 11 13 1.3843 2.530
## 6 After Pine Cay Female 4.697 10.02 10.78 6.85 7.18 8.45 7.15 2.39 3.26 12 14 1.4260 2.036
Care, or at least think, about the data.
Look at the data.
Query the data.
Check the results.
Communicate.
measurement ~ SVL + hurricane
, which allows more
flexibility in how overall length is incorporatedExercise: Write a possible conclusion sentence or two (imagining what might happen). Make sure to communicate (a) the size of the effect we’re looking for, in real-world terms; (b) the strength of the statistical evidence; and (c) context (e.g., how’s the size of the hurricane-induced change compare to other differences).
We found that the typical ratio of hand size to body size was 10% larger after than before (t-test, p-value 0.01). This is roughly 2 standard deviations of the ratios observed within each sex; ratios do not seem to differ between islands.
The body radius decreased by 20% after the hurricane compared to before.
Metatarsal length increased by 5% after the hurricane (possibly due to incrased grip).
We see that lizards with the longest fingers and toes (5% longer on average) survived better on average (t-test of p=0.01); they seem to have been able to hold on better during the hurricane.
lizards_long <- pivot_longer(lizards, col=all_of(varnames), names_to="measurement")
ggplot(lizards_long) + geom_histogram(aes(x=value)) + facet_wrap(~measurement)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
There is not a huge, obvious shift.
Let’s do some t-tests to see what happens:
results <- expand.grid(
variable=varnames,
sex=levels(lizards$Sex),
stringsAsFactors=FALSE
)
results$p <- NA
results$lower_CI <- NA
results$upper_CI <- NA
for (j in 1:nrow(results)) {
v <- results$variable[j]
s <- results$sex[j]
x <- subset(lizards, Sex==s)
t <- t.test(x[[v]][x$Hurricane == "Before"],
x[[v]][x$Hurricane == "After"]
)
results$p[j] <- t$p.value
results$lower_CI[j] <- t$conf.int[1]
results$upper_CI[j] <- t$conf.int[2]
}
This is preliminary, but it seems that maybe males need shorter (?) legs, and females need larger (?) fingers.
## variable sex p lower_CI upper_CI
## 15 Femur Male 3.030e-07 0.70632 1.49627
## 18 LongestToe Male 5.133e-06 0.41631 0.98772
## 12 MeanFingerArea Female 1.204e-05 -0.25470 -0.10373
## 16 Tibia Male 6.230e-05 0.36557 1.02030
## 13 MeanToeArea Female 4.725e-04 -0.36000 -0.10662
## 10 FingerCount Female 1.172e-03 -1.04091 -0.26911
## 17 Metatarsal Male 1.410e-03 0.14257 0.57219
## 14 SVL_cm Male 1.917e-03 0.09161 0.39137
## 2 Femur Female 1.054e-02 0.10359 0.75492
## 20 Radius Male 1.402e-02 0.06389 0.55246
## 6 Humerus Female 3.436e-02 -0.47849 -0.01885
## 22 LongestFinger Male 1.227e-01 -0.03672 0.30369
## 8 Metacarpal Female 1.354e-01 -0.21897 0.03025
## 21 Metacarpal Male 2.715e-01 -0.05845 0.20522
## 1 SVL_cm Female 4.324e-01 -0.13187 0.05706
## 11 ToeCount Female 4.970e-01 -0.65824 0.32257
## 5 LongestToe Female 4.980e-01 -0.13888 0.28289
## 19 Humerus Male 5.556e-01 -0.25189 0.46536
## 25 MeanFingerArea Male 6.349e-01 -0.18544 0.11373
## 23 FingerCount Male 6.674e-01 -0.44854 0.28866
## 3 Tibia Female 6.923e-01 -0.33560 0.22413
## 4 Metatarsal Female 7.478e-01 -0.22756 0.16415
## 26 MeanToeArea Male 8.006e-01 -0.21041 0.27190
## 24 ToeCount Male 8.869e-01 -0.35325 0.40787
## 9 LongestFinger Female 9.214e-01 -0.15176 0.13740
## 7 Radius Female 9.294e-01 -0.17156 0.18756
Let’s look at the Femur ~ SVL plot:
simple_lm <- lm(Femur ~ SVL_cm + Hurricane, data=lizards)
plot(Femur ~ SVL_cm, data=lizards, col=ifelse(Hurricane=="Before", "black", "red"), pch=20)
coefs <- coef(simple_lm)
abline(coefs["(Intercept)"], coefs["SVL_cm"])
abline(coefs["(Intercept)"] + coefs["HurricaneAfter"], coefs["SVL_cm"], col='red', pch=20)
legend("bottomright", lty=1, col=c("black", "red"), legend=c("Before", "After"))
Steps:
Goal:
Remaining female lizards of average size on Pine Cay had an average finger pad area that was X% larger (95% CI of L-U%) than before the hurricane. This difference of X square centimeters represents roughly Y standard deviations of finger pad area. Similarly, males of average length …
Pick a family: recall that the “family” describes the distribution of the residuals, so let’s check a histogram of those:
Priors: differences between groups will be no bigger than 1mm or so (for Femur length); slopes are no bigger than 10. So, we’ll set an uninformative Normal prior, with mean 0 and sd 10; it’s “uninformative” because it’s just saying that the parameters are no bigger than \(\pm\) 20 or so.
bformula <- brmsformula(Femur ~ SVL_cm * Sex + Hurricane + Origin)
get_prior(bformula, data=lizards)
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b HurricaneAfter (vectorized)
## (flat) b OriginWaterCay (vectorized)
## (flat) b SexMale (vectorized)
## (flat) b SVL_cm (vectorized)
## (flat) b SVL_cm:SexMale (vectorized)
## student_t(3, 10.7, 2.5) Intercept default
## student_t(3, 0, 2.5) sigma default
Rhat and ESS look good: it seems like we have convergence.
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Femur ~ SVL_cm * Sex + Hurricane + Origin
## Data: lizards (Number of observations: 164)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.49 1.59 -1.65 4.63 1.00 1796 2265
## SVL_cm 1.88 0.36 1.17 2.61 1.00 1775 2231
## SexMale -0.19 1.82 -3.81 3.44 1.00 1636 1701
## HurricaneAfter -0.56 0.10 -0.76 -0.37 1.00 2744 2552
## OriginWaterCay -0.14 0.10 -0.33 0.06 1.00 3751 2390
## SVL_cm:SexMale 0.19 0.40 -0.60 0.99 1.00 1610 1664
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.61 0.03 0.55 0.68 1.00 3719 2291
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Fuzzy caterpillars are good, also:
This does not look like there’s a difference before/after?
Here are posterior distributions for before the hurricane: notice that males/females aren’t overlapping at intermediate lengths.
Note that after the hurricane, they found smaller male lizards; perhaps the bigger ones got blown away, so smaller ones got to come out into their territories.
In each case, the 95% credible intervals for the “before” and “after” lizards are separated, in the regions where we have data.
Here, let’s get the posterior distributions of femur length for lizards of average length in the eight combinations of hurricane, sex, and origin, as well as the posterior of the average difference (after minus before):
mean_lengths <- tapply(lizards$SVL_cm, lizards$Sex, mean)
pred_data <- expand.grid(
Sex=levels(lizards$Sex),
Origin=levels(lizards$Origin),
Hurricane=levels(lizards$Hurricane)
)
pred_data$SVL_cm <- mean_lengths[as.character(pred_data$Sex)]
posts <- posterior_epred(bfit,
newdata=pred_data
)
post_results <- pred_data[1:4, c("Sex", "Origin", "SVL_cm")]
post_results$mean_diff <- colMeans(
posts[,pred_data$Hurricane == "After"]
- posts[,pred_data$Hurricane == "Before"]
)
post_results$lower_ci_diff <- colQuantiles(
posts[,pred_data$Hurricane == "After"]
- posts[,pred_data$Hurricane == "Before"],
probs=0.025
)
post_results$upper_ci_diff <- colQuantiles(
posts[,pred_data$Hurricane == "After"]
- posts[,pred_data$Hurricane == "Before"],
probs=0.975
)
Here are the results. Notice that all the mean differences are identical; that’s because under our model the effect of Hurricane is additive, independent of anything else. So, the posterior distribution of the difference in mean femur length, after minus before, is the same for any conditions.
## Sex Origin SVL_cm mean_diff lower_ci_diff upper_ci_diff
## 1 Female Pine Cay 4.344 -0.5611 -0.7611 -0.3683
## 2 Male Pine Cay 5.434 -0.5611 -0.7611 -0.3683
## 3 Female Water Cay 4.344 -0.5611 -0.7611 -0.3683
## 4 Male Water Cay 5.434 -0.5611 -0.7611 -0.3683
So, we could just look at the posterior distribution for that parameter, e.g.
## Warning: The following arguments were unrecognized and ignored: bindiwdth
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
However, it’s a better idea in general (easier to figure out, more generalizable, less confusing) to look at posterior distributions of predicted differences themselves, rather than the underlying parameters. So, we’ll do that. Here’s the posterior distribution for the difference in mean femur length of females on Pine Cay of average length:
We’d also like to know how this difference compares to typical
variation in femur length; so, we’d like to know how much lizards that
are otherwise the same differ in femur length. This is given by the
sigma
parameter in this (Gaussian) model, which we can get
out the summary table:
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Femur ~ SVL_cm * Sex + Hurricane + Origin
## Data: lizards (Number of observations: 164)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.49 1.59 -1.65 4.63 1.00 1796 2265
## SVL_cm 1.88 0.36 1.17 2.61 1.00 1775 2231
## SexMale -0.19 1.82 -3.81 3.44 1.00 1636 1701
## HurricaneAfter -0.56 0.10 -0.76 -0.37 1.00 2744 2552
## OriginWaterCay -0.14 0.10 -0.33 0.06 1.00 3751 2390
## SVL_cm:SexMale 0.19 0.40 -0.60 0.99 1.00 1610 1664
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.61 0.03 0.55 0.68 1.00 3719 2291
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Remaining lizards had an average femur length that was 0.56mm shorter (95% CI of 0.76-0.37mm) than before the hurricane. This is about a 5% reduction in femur length. The estimated standard deviation of femur length is 0.61mm among lizards in the same condition (95% CI of 0.55 to 0.68mm), so this change represents slightly less than one standard deviation.
The posterior predictive checks look good - no obvious issues with model fit.
## Using 10 posterior draws for ppc type 'scatter' by default.
## Using all posterior draws for ppc type 'loo_intervals' by default.
## Using 10 posterior draws for ppc type 'hist' by default.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.