\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Homework number 14: Drosophila viruses

Assignment: Your task is to use Rmarkdown to write a short report, readable by a technically literate person. The code you used should not be visible in the final report (unless you have a good reason to show it).

Due: Submit your work via Canvas by the end of the day (midnight) on Friday, February 11th. Please submit both the Rmd file and the resulting html or pdf file. You can work with other members of class, but I expect each of you to construct and run all of the scripts yourself.

The Problem

In this project you will analyze data from the paper Host-pathogen coevolution increases genetic variation in susceptibility to infection (Duxbury et al 2019, eLife), which studies whether species (of Drosophila fruit fly) have more genetic variation for resistance to viruses that naturally infect that species than to viruses from other species. You will analyze their data from Drosophila melanogaster: the experimenters kept 230 inbred lines (called “families” in the data) of fruit fly, and measured viral loads in individual flies after infecting them with one of three types of sigma virus: one found in Drosophila melanogaster (“M” in the data), one found in Drosophila obscura (“O” in the data), and one found in Drosophila affinis (“A” in the data).

The data are found in Dmel_genotype_data.csv, and columns in the data file are:

You should describe the data, and then fit a multivariate mixed-effects model using the following brms code:

brm(
    bf(mvbind(A, M, O) | mi() ~ chkov + ref2p + (1|n|family) + (1|day))
    + set_rescor(FALSE),
    data=geno,
    chains=3
)

(Notice that there’s a fair amount of missing data, as not every virus was measured on every day. The | mi() term asks brms to impute these missing values, effectively treating the missing values as parameters to be estimated.)

Interpret the results to answer the following questions (including appropriate measures of uncertainty):

  1. How much do resistance loci at the CHKov1 and Ref(2)p loci affect viral loads for the three viruses? The experimenter’s hypothesis is that they would affect the coevolved “M” virus more than the others. (Note: here conditional_effects() will lead you astray; instead, use the posterior distribution of the corresponding parameter in the model.)

  2. How much do different families differ in their average viral load after infection with each of the three viruses, after accounting for the resistance loci, and is this family-specific response correlated between the three viruses? (This measures remaining genetic variation for resistance, and asks whether the genetic variation for resistance to one virus is effective against the others.)

  3. For future experiments, what would we predict would be the range of viral loads for flies in the A13_17 family in response infection with the “O” virus? (Hint: use posterior_predict( ); this is possible because of the missing data imputation with |mi().)