\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\MVN}{MVN} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Data analysis example: Hurricane lizards

Peter Ralph

Advanced Biological Statistics

Hurricane lizards

Data from: Donihue, C.M., Herrel, A., Fabre, AC. et al. Hurricane-induced selection on the morphology of an island lizard. Nature 560, 88–91 (2018).

Anolis scriptus morphology and performance from before and after hurricanes

Lizard morphology data collected from two visits to Pine Cay and Water Cay in
the Turks and Caicos Islands. Anolis scriptus lizards were surveyed on both
islands four days before hurricane Irma and again, six weeks later, after the
islands had been hit by Hurricanes Irma and Maria. We measured morphology and
performance and found significant differences in the "before" and "after"
hurricane lizard population morphology. All linear measurements are in MM,
surface area measurements are in MM2 and force in N. Counts correspond to the
number of lamellar scales on the forelimb and hind limb toepads.

Figure S1

Here’s the data: ../Datasets/Hurricane_lizards/. First we’ll read it in, reorder the levels of the Hurricane factor (so that “before” comes before “after”), change units on SVL to centimeters so it’s on the same scale as the other variables (useful below), and drop unused variables.

lizards <- read.csv(
            "../Datasets/Hurricane_lizards/lizards.csv",
            header=TRUE, stringsAsFactors=TRUE)
lizards$Hurricane <- factor(lizards$Hurricane, levels=c("Before", "After"))
lizards$SVL_cm <- lizards$SVL / 10
varnames <- c("SVL_cm", "Femur", "Tibia", "Metatarsal", "LongestToe", "Humerus", 
    "Radius", "Metacarpal", "LongestFinger", "FingerCount", "ToeCount", 
    "MeanFingerArea", "MeanToeArea")
lizards <- lizards[, c(c("Hurricane", "Origin", "Sex"), varnames)]
head(lizards)
##   Hurricane   Origin    Sex SVL_cm Femur Tibia Metatarsal LongestToe Humerus Radius Metacarpal LongestFinger FingerCount ToeCount MeanFingerArea MeanToeArea
## 1     After Pine Cay   Male  4.869 10.39 11.87       7.52       7.43    8.66   7.99       2.22          3.19          10       12         1.3327       2.433
## 2     After Pine Cay Female  4.031  8.66  9.79       6.18       6.20    8.01   6.51       2.38          3.55          10       13         0.9613       1.518
## 3     After Pine Cay   Male  5.830 12.87 14.76       9.45       9.58   11.72   9.54       3.54          5.09          14       15         2.6313       4.098
## 4     After Pine Cay Female  4.315  8.55 10.29       6.60       6.26    7.43   6.60       2.79          3.55          11       12         1.1777       1.879
## 5     After Pine Cay Female  4.551 10.26 11.02       6.89       7.02    7.71   7.25       2.52          3.37          11       13         1.3843       2.530
## 6     After Pine Cay Female  4.697 10.02 10.78       6.85       7.18    8.45   7.15       2.39          3.26          12       14         1.4260       2.036

Here’s the sample sizes:

with(lizards, table(Sex, Origin, Hurricane))
## , , Hurricane = Before
## 
##         Origin
## Sex      Pine Cay Water Cay
##   Female       15        18
##   Male         18        20
## 
## , , Hurricane = After
## 
##         Origin
## Sex      Pine Cay Water Cay
##   Female       17        22
##   Male         29        25

Brainstorming

  • What morphological differences do we expect betweeen the “before” and “after” groups of lizards?
  • What visualizations can we use to look for these differences? (Sketch them.)
  • What statistical analysis can we use?
##   Hurricane   Origin    Sex SVL_cm Femur Tibia Metatarsal LongestToe Humerus Radius Metacarpal LongestFinger FingerCount ToeCount MeanFingerArea MeanToeArea
## 1     After Pine Cay   Male  4.869 10.39 11.87       7.52       7.43    8.66   7.99       2.22          3.19          10       12         1.3327       2.433
## 2     After Pine Cay Female  4.031  8.66  9.79       6.18       6.20    8.01   6.51       2.38          3.55          10       13         0.9613       1.518
## 3     After Pine Cay   Male  5.830 12.87 14.76       9.45       9.58   11.72   9.54       3.54          5.09          14       15         2.6313       4.098
## 4     After Pine Cay Female  4.315  8.55 10.29       6.60       6.26    7.43   6.60       2.79          3.55          11       12         1.1777       1.879
## 5     After Pine Cay Female  4.551 10.26 11.02       6.89       7.02    7.71   7.25       2.52          3.37          11       13         1.3843       2.530
## 6     After Pine Cay Female  4.697 10.02 10.78       6.85       7.18    8.45   7.15       2.39          3.26          12       14         1.4260       2.036

Steps in data analysis

  1. Care, or at least think, about the data.

  2. Look at the data.

  3. Query the data.

  4. Check the results.

  5. Communicate.

Ideas

Which variables do we expect? Anything that lowers the cetner of gravity might be associated. Also, big feet might be important.

  1. Look at the distributions, with the pre- and post-hurricane histograms overlaid. Or, boxplots.

  2. Look at measurements as a ratio of body length, since bigger bodies = bigger sails. (eg make histograms of this)

  3. With boxplots, we might do t-tests to compare before/after.

Goals

  • describe how traits differ before/after hurricanes
  • account for confounding factor: overall size
  • also possibly account for differences between islands and sexes

Exercise: Write a possible conclusion sentence or two (imagining what might happen). Make sure to communicate (a) the size of the effect we’re looking for, in real-world terms; (b) the strength of the statistical evidence; and (c) context (e.g., how’s the size of the hurricane-induced change compare to other differences).

Examples

We found that the typical ratio of hand size to body size was 10% larger after than before (t-test, p-value 0.01). This is roughly 2 standard deviations of the ratios observed within each sex; ratios do not seem to differ between islands.

The body radius decreased by 20% after the hurricane compared to before.

Metatarsal length increased by 5% after the hurricane (possibly due to incrased grip).

We see that lizards with the longest fingers and toes (5% longer on average) survived better on average (t-test of p=0.01); they seem to have been able to hold on better during the hurricane.

Look at the data

Histograms

lizards_long <- pivot_longer(lizards, col=all_of(varnames), names_to="measurement")
ggplot(lizards_long) + geom_histogram(aes(x=value)) + facet_wrap(~measurement)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk r hists

There is not a huge, obvious shift.

ggplot(lizards_long) + geom_boxplot(aes(x=measurement, y=value, col=Hurricane)) +  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 

plot of chunk r next

ggplot(lizards_long) + geom_boxplot(aes(x=measurement, y=value, fill=Hurricane)) +  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + facet_wrap(~Sex) 

plot of chunk r next2

Let’s do some t-tests to see what happens:

results <- expand.grid(
    variable=varnames,
    sex=levels(lizards$Sex),
    stringsAsFactors=FALSE
)
results$p <- NA
results$lower_CI <- NA
results$upper_CI <- NA

for (j in 1:nrow(results)) {
    v <- results$variable[j]
    s <- results$sex[j]
    x <- subset(lizards, Sex==s)
     t <- t.test(x[[v]][x$Hurricane == "Before"],
            x[[v]][x$Hurricane == "After"]
     )
     results$p[j] <- t$p.value
     results$lower_CI[j] <- t$conf.int[1]
     results$upper_CI[j] <- t$conf.int[2]
}

This is preliminary, but it seems that maybe males need shorter (?) legs, and females need larger (?) fingers.

results[order(results$p),]
##          variable    sex         p lower_CI upper_CI
## 15          Femur   Male 3.030e-07  0.70632  1.49627
## 18     LongestToe   Male 5.133e-06  0.41631  0.98772
## 12 MeanFingerArea Female 1.204e-05 -0.25470 -0.10373
## 16          Tibia   Male 6.230e-05  0.36557  1.02030
## 13    MeanToeArea Female 4.725e-04 -0.36000 -0.10662
## 10    FingerCount Female 1.172e-03 -1.04091 -0.26911
## 17     Metatarsal   Male 1.410e-03  0.14257  0.57219
## 14         SVL_cm   Male 1.917e-03  0.09161  0.39137
## 2           Femur Female 1.054e-02  0.10359  0.75492
## 20         Radius   Male 1.402e-02  0.06389  0.55246
## 6         Humerus Female 3.436e-02 -0.47849 -0.01885
## 22  LongestFinger   Male 1.227e-01 -0.03672  0.30369
## 8      Metacarpal Female 1.354e-01 -0.21897  0.03025
## 21     Metacarpal   Male 2.715e-01 -0.05845  0.20522
## 1          SVL_cm Female 4.324e-01 -0.13187  0.05706
## 11       ToeCount Female 4.970e-01 -0.65824  0.32257
## 5      LongestToe Female 4.980e-01 -0.13888  0.28289
## 19        Humerus   Male 5.556e-01 -0.25189  0.46536
## 25 MeanFingerArea   Male 6.349e-01 -0.18544  0.11373
## 23    FingerCount   Male 6.674e-01 -0.44854  0.28866
## 3           Tibia Female 6.923e-01 -0.33560  0.22413
## 4      Metatarsal Female 7.478e-01 -0.22756  0.16415
## 26    MeanToeArea   Male 8.006e-01 -0.21041  0.27190
## 24       ToeCount   Male 8.869e-01 -0.35325  0.40787
## 9   LongestFinger Female 9.214e-01 -0.15176  0.13740
## 7          Radius Female 9.294e-01 -0.17156  0.18756

Everything is correlated with everything else:

plot of chunk r pairs

sex and size are almost totally confounded

plot of chunk r pairs_sex

plot of chunk r pairs_origin

Let’s look at the Femur ~ SVL plot:

simple_lm <- lm(Femur ~ SVL_cm + Hurricane, data=lizards)
plot(Femur ~ SVL_cm, data=lizards, col=ifelse(Hurricane=="Before", "black", "red"), pch=20)
coefs <- coef(simple_lm)
abline(coefs["(Intercept)"], coefs["SVL_cm"])
abline(coefs["(Intercept)"] + coefs["HurricaneAfter"], coefs["SVL_cm"], col='red', pch=20)
legend("bottomright", lty=1, col=c("black", "red"), legend=c("Before", "After"))

plot of chunk r plitit

Fit some models

Today’s plan

  • Fit a model to only Femur and MeanFingerArea,
  • check model fit, and
  • summarize results.
Femur, MeanFingerArea ~ SVL_cm * (Hurricane + Origin)

Later: add I(SVL_cm^2), consider other variables.

Goal:

Remaining female lizards of average size on Pine Cay had an average finger pad area that was X% larger (95% CI of L-U%) than before the hurricane. This difference of X square centimeters represents roughly Y standard deviations of finger pad area. Similarly, males of average length …

(same for femur length)

IN CLASS

What family should we use?

layout(1:2)
hist( resid(lm(Femur ~ SVL_cm, data=lizards)), breaks=20 )
hist( resid(lm(MeanFingerArea ~ SVL_cm, data=lizards)), breaks=20 )

plot of chunk r resids Note: there are some outliers, maybe? We can check that later, and if so, change the family to “student-t”, maybe.

Check scale for priors

The slope of these lines is about 4 for Femur and less than that for finger area, and the difference between them is about 0.5. plot of chunk r plitit2

Fit a model

lfit <- 
    brm(
        brmsformula(
     mvbind(Femur, MeanFingerArea) 
       ~ SVL_cm * (Hurricane + Origin) + I(SVL_cm^2)
    ) + set_rescor(TRUE),
    data=lizards,
    family=gaussian,
    prior=set_prior("normal(0, 10)", class="b"),
    file="femur_finger_fit.rds"
)

Check for convergence

summary(lfit)
##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: Femur ~ SVL_cm * (Hurricane + Origin) + I(SVL_cm^2) 
##          MeanFingerArea ~ SVL_cm * (Hurricane + Origin) + I(SVL_cm^2) 
##    Data: lizards (Number of observations: 163) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##                                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Femur_Intercept                         -2.30      4.32   -10.85     6.07 1.00     1945     2475
## MeanFingerArea_Intercept                 0.12      1.18    -2.22     2.41 1.00     1863     2330
## Femur_SVL_cm                             3.24      1.71    -0.07     6.61 1.00     2009     2665
## Femur_HurricaneAfter                    -1.15      0.78    -2.73     0.38 1.00     2306     2283
## Femur_OriginWaterCay                    -2.22      0.84    -3.87    -0.58 1.00     2088     2375
## Femur_ISVL_cmE2                         -0.10      0.17    -0.43     0.23 1.00     2093     2638
## Femur_SVL_cm:HurricaneAfter              0.13      0.16    -0.18     0.44 1.00     2302     2252
## Femur_SVL_cm:OriginWaterCay              0.43      0.17     0.10     0.76 1.00     2014     2447
## MeanFingerArea_SVL_cm                   -0.21      0.47    -1.11     0.71 1.00     1929     2435
## MeanFingerArea_HurricaneAfter           -0.30      0.22    -0.73     0.13 1.00     2469     2569
## MeanFingerArea_OriginWaterCay           -0.42      0.23    -0.87     0.02 1.00     2250     2576
## MeanFingerArea_ISVL_cmE2                 0.09      0.05     0.00     0.18 1.00     2002     2436
## MeanFingerArea_SVL_cm:HurricaneAfter     0.10      0.04     0.01     0.19 1.00     2459     2439
## MeanFingerArea_SVL_cm:OriginWaterCay     0.10      0.05     0.01     0.19 1.00     2238     2497
## 
## Family Specific Parameters: 
##                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_Femur              0.62      0.03     0.55     0.69 1.00     5309     3583
## sigma_MeanFingerArea     0.17      0.01     0.15     0.19 1.00     4649     3225
## 
## Residual Correlations: 
##                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## rescor(Femur,MeanFingerArea)     0.22      0.08     0.07     0.37 1.00     4769     2834
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

fuzzy caterpillars good!

mcmc_trace(lfit)

plot of chunk r trace

Hm - negative posterior correlation between slope and intercept

mcmc_pairs(lfit, pars=c("b_Femur_Intercept", "b_Femur_SVL_cm", "b_Femur_HurricaneAfter", "b_Femur_SVL_cm:HurricaneAfter"))

plot of chunk r pairs2

Look at results

# conditional_effects(lfit)
plot(
  conditional_effects(lfit,
     effects="SVL_cm:Hurricane",
     resp="MeanFingerArea",
     conditions=data.frame(
         Origin=levels(lizards$Origin),
         cond__=levels(lizards$Origin)
     ),
     plot=FALSE),
  points=TRUE
)

plot of chunk r ceffs

# conditional_effects(lfit)
plot(
  conditional_effects(lfit,
     effects="SVL_cm:Hurricane",
     resp="Femur",
     conditions=data.frame(
         Origin=levels(lizards$Origin),
         cond__=levels(lizards$Origin)
     ),
     plot=FALSE),
  points=TRUE
)

plot of chunk r ceffs2

We’d like to know the propoportional increase in finger pad area (to say “average finger pad area was X% larger after the hurricane”), and to get a credible interval for this (i.e., for the “X% larger” part). To do this, we’ll use posterior_epred(), since this gets posterior distributions of expected values (given the predictors).

mean_female_length <- with(subset(lizards, Sex=="Female"),
                           mean(SVL_cm))
newdata <- data.frame(
    Origin="Pine Cay",
    SVL_cm=mean_female_length,
    Hurricane=levels(lizards$Hurricane)
)

pp <- posterior_epred(lfit, newdata=newdata, resp="MeanFingerArea")
# posterior distribution of the relative change
# (ie the change divided by the mean beforehand)
pp_ratio <- (pp[,2] - pp[,1])/pp[,1]

We also want to know how the mean increase in finger pad area compares to between-individual variation, i.e., the standard deviation of finger pad area for lizards of the same length and in the same situation. We get this using posterior_predict( ), since this obtains the posterior distribution of the response, i.e., of actual toe pad areas (not their mean values).

pp_values <- posterior_predict(lfit, newdata=newdata, resp="MeanFingerArea")
sd_response <- sd(pp_values[,1])

Remaining female lizards of average size on Pine Cay had an average finger pad area that was 15% larger (95% CI of 7 - 23%) than before the hurricane. This difference of 0.1414 square centimeters represents roughly three-quarters of a standard deviation of finger pad area. Similarly, males of average length …

Model fit

No outliers in the residuals: we’re probably OK with the Normal family.

pp_check(lfit, resp="MeanFingerArea", type='scatter')
## Using 10 posterior draws for ppc type 'scatter' by default.

plot of chunk r pp_check

pp_check(lfit, resp="MeanFingerArea", type='error_hist')
## Using 10 posterior draws for ppc type 'error_hist' by default.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk r pp_check

// reveal.js plugins