\[%% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]
Assignment: Your task is to use Rmarkdown to write a short report, readable by a technically literate person. The code you used should not be visible in the final report (unless you have a good reason to show it).
Due: Submit your work via Canvas by the end of the day (midnight) on Friday, November 12th. Please submit both the Rmd file and the resulting html file. You can work with other members of class, but I expect each of you to construct and run all of the scripts yourself.
For this assignment, you’ll be using a dataset from Hannah Tavaliere’s MSc research. In this project Hannah sampled 21 lakes across New England to evaluate the growth characteristics of three genetic lineages of an invasive aquatic plant (variable-leaf watermilfoil). The main objective of this study was to investigate whether the hybrid lineage (HYB
) grows more aggressively than the parental lineages (CON
and ACP
). For each lake, you have plant growth and environmental data obtained from 2-3 100m transects randomly placed in each lake. For each transect you have data on three metrics of growth: average dry mass per plant (Dry_Mass_per_Plant_g
), average total branch count(Total_Branches_per_Plant
), and density of individual plants per square meter (Density_per_m2
). These three measures capture different aspects of aggressive growth by describing individual plant mass, spread in the water column, and density of plant beds. You can find the data here.
Ignore for now that some of our dependent variables are counts… we will talk about what to do with those later in the term.
Your report should contain (and describe) the results of the following. Make sure to report conclusions in real terms, referring to the biological quantities of interest.
Create a multi-panel plot for each growth metric by lineage and describe what you see.
Now focus only on total branches per plant. Fit linear models to describe whether total branches per plant differs between lineages (and by how much) and determine whether any of the environmental parameters (Dissolved_O2, Temperature, Conductivity, Alkalinity, or pH) are important covariates. Instead of running a lot of anova
model comparisons, try selecting your model based on Akaike’s Information Criterion (AIC). AIC works by comparing log-likelihoods of successive models, but penalizing models by how many parameters they have (so that more complex models are slightly disfavored). You can do this by hand, or use the function stepAIC
in the library MASS
. Use help
to guide you. (Note: do not include the whole stepAIC
output to your homework- it will be long). Provide output from your final models based on AIC. Describe how your analysis supports or refutes the prediction that the ‘HYB’ lineage of variable-leaf watermilfoil grows more aggressively than the parental lineages.
You have not so far accounted for variation between lakes. Add a random effect for Lake
in the final model you selected for each growth metric above. Did your results change?
Test whether the random effect improved the fit of the model using an ANOVA comparison. What can you conclude about this random effect as a source of variation, based on these results? Which model is a better fit- linear or mixed effects?
(optional): Run the same analyses on the other two measures of plant growth: dry mass per plant and density per square meter. Incorporate these results into your conclusions.