Advanced Biological Statistics Week 7

11/06/2018 and 11/08/2018

Goals for this week

Experimental Design
Power analyses
Multi-factor ANOVA
Nested ANOVA
Factorial ANOVA
Analysis of CoVariance (ANCOVA)

Design principles for planning a good experiment

What is an experimental study?

In an experimental study the researcher assigns treatments to units
In an observational study nature does the assigning of treatments to units
The crucial advantage of experiments derives from the random assignment of treatments to units
Random assignment, or randomization, minimizes the influence of confounding variables

Mount Everest example

Survival of climbers of Mount Everest is higher for individuals taking supplemental oxygen than those who don’t.

Why?

Mount Everest example

One possibility is that supplemental oxygen (explanatory variable) really does cause higher survival (response variable).
The other is that the two variables are associated because other variables affect both supplemental oxygen and survival.
Use of supplemental oxygen might be a benign indicator of a greater overall preparedness of the climbers that use it.
Variables (like preparedness) that distort the causal relationship between the measured variables of interest (oxygen use and survival) are called confounding variables
They are correlated with the variable of interest, and therefore preventing a decision about cause and effect.
With random assignment, no confounding variables will be associated with treatment except by chance.

Clinical Trials

The gold standard of experimental designs is the clinical trial
Experimental design in all areas of biology have been informed by procedures used in clinical trials
A clinical trial is an experimental study in which two or more treatments are assigned to human subjects
The design of clinical trials has been refined because the cost of making a mistake with human subjects is so high
Experiments on nonhuman subjects are simply called “laboratory experiments”or “field experiments”

Example of a clinical trial

Transmission of the HIV-1 virus via sex workers contributes to the rapid spread of AIDS in Africa
The spermicide nonoxynol-9 had shown in vitro activity against HIV-1, which motivated a clinical trial by van Damme et al. (2002).
They tested whether a vaginal gel containing the chemical would reduce the risk of acquiring the disease by female sex workers.
Data were gathered on a volunteer sample of 765 HIV-free sex-workers in six clinics in Asia and Africa.
Two gel treatments were assigned randomly to women at each clinic.
One gel contained nonoxynol-9 and the other contained a placebo.
Neither the subjects nor the researchers making observations at the clinics knew who received the treatment and who got the placebo.

Example of a clinical trial

Design components of a clinical trial

The goal of experimental design is to eliminate bias and to reduce sampling error when estimating and testing effects of one variable on another.

To reduce bias, the experiment included:
- Simultaneous control group: study included both the treatment of interest and a control group (the women receiving the placebo).
- Randomization: treatments were randomly assigned to women at each clinic.
- Blinding: neither the subjects nor the clinicians knew which women were assigned which treatment.
To reduce the effects of sampling error, the experiment included:
- Replication: study was carried out on multiple independent subjects.
- Balance: number of women was nearly equal in the two groups at every clinic.
- Blocking: subjects were grouped according to the clinic they attended, yielding multiple repetitions of the same experiment in different settings (“blocks”).

Simultaneous control group

In clinical trials either a placebo or the currently accepted treatment should be provided.
In experiments requiring intrusive methods to administer treatment, such as
- injections
- surgery
- restraint
- confinement
the control subjects should be perturbed in the same way as the other subjects, except for the treatment itself, as far as ethical considerations permit.
The “sham operation”, in which surgery is carried out without the experimental treatment itself, is an example.
In field experiments, applying a treatment of interest may physically disturb the plots receiving it and the surrounding areas, perhaps by trampling the ground by the researchers.
Ideally, the same disturbance should be applied to the control plots.

Randomization

The researcher should randomize assignment of treatments to units or subjects
Chance rather than conscious or unconscious decision determines which units end up receiving the treatment and which the control
A completely randomized design is one in which treatments are assigned to all units by randomization
Randomization breaks the association between possible confounding variables and the explanatory variable
Randomization doesn’t eliminate the variation contributed by confounding variables, only their correlation with treatment
Randomization ensures that variation from confounding variables is similar between the different treatment groups.

Randomization

Randomization should be carried out using a random process:
- List all n subjects, one per row, in a computer spreadsheet.
- Use the computer to give each individual a random number.
- Assign treatment A to those subjects receiving the lowest numbers and treatment B to those with the highest numbers.
Other ways of assigning treatments to subjects are almost always inferior because they do not eliminate the effects of confounding variables.
“Haphazard” assignment, in which the researcher chooses a treatment while trying to make it random, has repeatedly been shown to be non-random and prone to bias.

Blinding

Blinding is the process of concealing information from participants (sometimes including researchers) about which subjects receive which treatment.
Blinding prevents subjects and researchers from changing their behavior, consciously or unconsciously, as a result of knowing which treatment they were receiving or administering.
For example, studies showing that acupuncture has a significant effect on back pain are limited to those without blinding (Ernst and White 1998).

Blinding

In a single-blind experiment, the subjects are unaware of the treatment that they have been assigned.
Treatments must be indistinguishable to subjects, which prevents them from responding differently according to knowledge of treatment.
Blinding can also be a concern in non-human studies where animals respond to stimuli
In a double-blind experiment the researchers administering the treatments and measuring the response are also unaware of which subjects are receiving which treatments
- Researchers sometimes have pet hypotheses, and they might treat experimental subjects in different ways depending on their hopes for the outcome
- Many response variables are difficult to measure and require some subjective interpretation, which makes the results prone to a bias
- Researchers are naturally more interested in the treated subjects than the control subjects, and this increased attention can itself result in improved response

Blinding

Reviews of medical studies have revealed that studies carried out without double- blinding exaggerated treatment effects by 16% on average compared with studies carried out with double-blinding (Jüni et al. 2001).
Experiments on non–human subjects are also prone to bias from lack of blinding.
Bebarta et al.(2003) reviewed 290 two-treatment experiments carried out on animals or on cell lines. The odds of detecting a positive effect of treatment were more than threefold higher in studies without blinding than in studies with blinding.
Blinding can be incorporated into experiments on nonhuman subjects using coded tags that identify the subject to a “blind” observer without revealing the treatment (and who measures units from different treatments in random order).

Replication

The goal of experiments is to estimate and test treatment effects against the background of variation between individuals (“noise”) caused by other variables
One way to reduce noise is to make the experimental conditions constant
In field experiments, however, highly constant experimental conditions might not be feasible nor desirable
By limiting the conditions of an experiment, we also limit the generality of the results
Another way to make treatment effects stand out is to include extreme treatments and to replicate the data.

Replication

Replication is the assignment of each treatment to multiple, independent experimental units.
Without replication, we would not know whether response differences were due to the treatments or just chance differences between the treatments caused by other factors.
Studies that use more units (i.e. that have larger sample sizes) will have smaller standard errors and a higher probability of getting the correct answer from a hypothesis test.
Larger samples mean more information, and more information means better estimates and more powerful tests.
Replication is not about the number of plants or animals used, but the number of independent units in the experiment. An “experimental unit” is the independent unit to which treatments are assigned.
The figure shows three experimental designs used to compare plant growth under two temperature treatments (indicated by the shading of the pots). The first two designs are un-replicated.

Pseudoreplication

Balance

A study design is balanced if all treatments have the same sample size.
Conversely, a design is unbalanced if there are unequal sample sizes between treatments.
Balance is a second way to reduce the influence of sampling error on estimation and hypothesis testing.
To appreciate this, look again at the equation for the standard error of the difference between two treatment means.

For a fixed total number of experimental units, n1 + n2, the standard error is smallest when n1 and n2 are equal.
Balance has other benefits. For example, ANOVA is more robust to departures from the assumption of equal variances when designs are balanced or nearly so.

Blocking

Blocking is the grouping of experimental units that have similar properties. Within each block, treatments are randomly assigned to experimental units.
Blocking essentially repeats the same, completely randomized experiment multiple times, once for each block.
Differences between treatments are only evaluated within blocks, and in this way the component of variation arising from differences between blocks is discarded.

Blocking

Paired designs

For example, consider the design choices for a two-treatment experiment to investigate the effect of clear cutting on salamander density.
In the completely randomized (“two-sample”) design we take a random sample of forest plots from the population and then randomly assign each plot to either the clear-cut treatment or the no clear-cut treatment.
In the paired design we take a random sample of forest plots and clear-cut a randomly chosen half of each plot, leaving the other half untouched.

Blocking

Paired designs

In the paired design, measurements on adjacent plot-halves are not independent. This is because they are likely to be similar in soil, water, sunlight, and other conditions that affect the number of salamanders.
As a result, we must analyze paired data differently than when every plot is independent of all the others, as in the case of the two-sample design.
Paired design is usually more powerful than completely randomized design because it controls for a lot of the extraneous variation between plots or sampling units that sometimes obscures the effects we are looking for.

Blocking

Paired designs

Blocking

Randomized complete block design

RCB design is analogous to the paired design, but may have more than two treatments. Each treatment is applied once to every block.
As in the paired design, treatment effects in a randomized block design are measured by differences between treatments exclusively within blocks.
By accounting for some sources of sampling variation blocking can make differences between treatments stand out.
Blocking is worthwhile if units within blocks are relatively homogeneous, apart from treatment effects, and units belonging to different blocks vary because of environmental or other differences.

What if you can’t do experiments?

Experimental studies are not always feasible, in which case we must fall back upon observational studies.
The best observational studies incorporate as many of the features of good experimental design as possible to minimize bias (e.g., blinding) and the impact of sampling error (e.g., replication, balance, blocking, and even extreme treatments) except for one: randomization.
Randomization is out of the question, because in an observational study the researcher does not assign treatments to subjects. Instead, the subjects come as they are.
Two strategies are used to limit the effects of confounding variables on a difference between treatments in a controlled observational study: matching; and adjusting for known confounding variables (covariates).

Statistical power

Recall type 1 and type 2 errors

Power

underappreciated aspect of experimental design

Type 1 error - \(\alpha\) - incorrectly rejecting a true null hypothesis
- This is saying that there is an effect when there isn’t)
Type 2 error - \(\beta\) - incorrectly accepting a false null hypothesis
- This is saying that there isn’t an effect when there is)
Power is the probability of rejecting a false null hypothesis
Mostly we shoot for a power of around 80%
Power can be calculated post hoc or a priori

Power

the things one needs to know

\[ Power \propto \frac{(ES)(\alpha)(\sqrt n)}{\sigma}\]

Power is proportional to the combination of these parameters
- ES - effect size; how large is the change of interest?
- alpha - significance level (usually 0.05)
- n - sample size
- sigma - standard deviation among experimental units within the same group.

Power

what we usually want to know

Power

rough calculation

R INTERLUDE

Perform a one-way ANOVA with power

Read in the perchlorate data again
Perform an ANOVA to test Strain on T4_Hormone_Level, but log-transform (base 10) the T4 variable

perc <- read.table('perchlorate_data.tsv', header=T, sep='\t')
x <- perc$Strain
y <- log10(perc$T4_Hormone_Level)

MyANOVA <- aov(y ~ x)
summary (MyANOVA)
boxplot(y ~ x)

R INTERLUDE

Perform a one-way ANOVA with power

Consider the parameters of this test related to power:
- The per-group sample sizes
- The standard deviation (use the higher within-group sd)
- The effect size (|difference between means| / within-grp sd)
For more complex ANOVA power calculations (>2 groups):
- The total variance
- The within-group variance (use the higher one)

R INTERLUDE

post hoc and a priori power analyses

Based on your results, calculate the power for your ANOVA.

   pwr.t2n.test(n1=xxx, n2=xxx, d=xxx, sig.level=.05, power=NULL)

Check out the functions in the ‘pwr’ library (Unnecessary in this case, but could use ANOVA version):

   pwr.anova.test(k=2, n=190, f=0.25, sig.level=.05, power=NULL)

R INTERLUDE

post hoc and a priori power analyses

effect size approximations:
- f=0.1 (small)
- f=0.25 (medium)
- f=0.4 (large)
see http://www.statmethods.net/stats/power.html

R INTERLUDE

post hoc and a priori power analyses

Let’s say you have to repeat the experiment, but your IACUC wants you to get by by using fewer fish.
You want to be able to detect a minimum mean difference of 1.3 T4 units (about 0.114 on the log10 scale), at a power of 90%.
First, divide 0.114 by std.dev. of transformed WK values (the higher std. dev. of the two groups) to get a conservative “d”.
What kind of sample size for the WK group would you need???
(Again use the pwr.t2n.test() function, but this time specify the WK sample size as the unknown parameter)

Goals for this week

Experimental Design
Power analyses
Multi-factor ANOVA
Nested ANOVA
Factorial ANOVA
Analysis of CoVariance (ANCOVA)

Multifactor ANOVA

Nested ANOVA or nested design
- factors might be hierarchical - in other words nested - within one another
- The sources of variance are therefore hierarchical too
The factorial ANOVA design is the most common experimental design used to investigate more than one treatment variable
- In a factorial design every combination of treatments from two (or more) treatment variables is investigated.
- The main purpose of a factorial design is to evaluate possible interactions between variables.
- An interaction between two explanatory variables means that the effect of one variable on the response depends on the state of a second variable.

Multifactor ANOVA

Key difference between nested and factorial designs

Nested designs are hierarchical
- often contain sub-replicates that are random, uncontrolled, nuisance effects
- but the nested factors can be of interest too
Factorial designs are
- all pairwise combinations,
- and often involve all combinations of factor levels
- when each factor is fixed interactions can be assessed
Completely nested designs therefore have no interaction terms, whereas factorial designs do
Mixed models can have a combination of fixed and random factors that are more complicated

Nested ANOVA

Walking stick example

Example 1: Study of “repeatability” (simple nested design)
The walking stick, Timema cristinae, is a wingless herbivorous insect on plants in chaparral habitats of California.
Nosil and Crespi (2006) measured individuals using digital photographs.
To evaluate measurement repeatability they took two separate photographs of each specimen.
After measuring traits on one set of photographs, they repeated the measurements on the second set.

Nested ANOVA

Walking stick example

Each pair of dots represents the two measurements

Nested ANOVA

Walking stick example

Nested ANOVA

ANOVA Table of Results

Nesting Logic

Nesting equations

Nesting hypothesis tests

Nesting MS calculations

Nested ANOVA table of results

R INTERLUDE

Nested ANOVA

R INTERLUDE

Nested ANOVA

andrew_data <- read.table('andrew.tsv', header=T, sep=‘\t')
head(andrew_data)

There are four variables: ‘TREAT’, ‘PATCH’, ‘QUAD’ and ‘ALGAE’
The main effect factor is TREAT
Make a simplified factor called TREAT2, in which 0% and 33% are a level called “low” and 66% and 100% are “high”

andrew_data$TREAT2 <- factor(c(rep(“low”,40),rep(“high”,40))

The nested factor is PATCH - also need to turn this into a factor

andrew_data$PATCH <- factor(andrew_data$PATCH)

R INTERLUDE

Nested ANOVA

In this case, our response variable is ALGAE
Look at the distribution of ALGAE for the two levels of TREAT2 using boxplots based on the patch means, which are the replicates in this case.

andrew.agg <- with(andrew_data, aggregate(data.frame(ALGAE), 
                  by = list(TREAT2=TREAT2, PATCH=PATCH), mean)

library(nlme)
andrew.agg <- gsummary(andrew_data, groups=andrew_data$PATCH)

boxplot(ALGAE ~ TREAT2, andrew.agg)

Evaluate assumptions based on the boxplots
Is the design balanced (equal numbers of sub-replicates per PATCH)?

R INTERLUDE

Nested ANOVA

Run the nested ANOVA:

nested.aov <- aov(ALGAE ~ TREAT2 + Error(PATCH), data=andrew_data)
summary(nested.aov)

Do we detect an effect of TREAT2 (high vs low sea urchin density)?
Estimate variance components to assess relative contributions of the random factors

library(nlme)
VarCorr(lme(ALGAE ~ 1, random = ~1 | TREAT2/PATCH, andrew_data))

Calculate the % of variation due to between-treatment differences vs. due to among patches within treatment differences.
See pg. 302 in Logan if you need help.
What do these variance component estimates tell us???

Factorial Designs

Multifactor ANOVA

For example, Relyae (2003) looked at how a moderate dose (1.6mg/L) of a commonly used pesticide, carbaryl (Sevin), affected bullfrog tadpole survival.
In particular, the experiment asked how the effect of carbaryl depended on whether a native predator, the red-spotted newt, was also present.
The newt was caged and could cause no direct harm, but it emitted visual and chemical cues to other tadpoles
The experiment was carried out in 10-L tubs (experimental units), each containing 10 tadpoles.
The four combinations of pesticide treatment (carbaryl vs. water only) and predator treatment (present or absent) were randomly assigned to tubs.
The results showed that survival was high except when pesticide was applied together with the predator.
Thus, the two treatments, predation and pesticide, seem to have interacted.

Multifactor ANOVA

Two Factor Factorial Designs

Three Factor Factorial Designs

Factorial Designs

Number of Replicates

Model 1 factorial ANOVA

both main effects fixed

Model 2 factorial ANOVA

both main effects fixed

Model 2 factorial ANOVA

both main effects random

The mean squares for a factorial model

The F-ratios for a factorial model

Interpretation

significant main and interaction effects

Interaction plots

R INTERLUDE

2-by-2 fixed effect factorial ANOVA

rnadata <- read.table('RNAseq.tsv', header=T, sep='')
head(rnadata)

continuous response variable and two main effect categorical variables

gene <- rnadata$Gene80
microbiota <- rnadata$Microbiota
genotype <- rnadata$Genotype
boxplot(gene ~ microbiota)
boxplot(gene ~ genotype)
boxplot(gene ~ microbiota*genotype)

Fit the factorial linear model

two different ways to do the same thing

rna_aov <- aov(gene ~ microbiota + genotype + microbiota:genotype)
rna_aov <- aov(gene ~ microbiota*genotype)

Examine the fitted model diagnostics and the ANOVA results table

plot(rna_aov)
summary(rna_aov)
anova(rna_aov)

What are the general results of our hypothesis tests?
If there is an interaction, can we understand it by looking at the boxplots?

R INTERLUDE

2-by-3 fixed effect factorial ANOVA

Try the following code to produce an interaction plot for the response variable cell count.
In this case there are 2 genotypes and 3 treatment levels.
Download the IntPlot_data file and IntPlot_Example.R
Go through the R script, get a feel for what it’s doing, and try to produce and interpret the interaction plot.

Means tests for multifactorial ANOVAs

Means tests

factor level combinations in multi-factor ANOVA

The F-ratio test for a single-factor ANOVA tests for any difference among groups.
If we want to understand specific differences, we need further “contrasts”.
Unplanned comparisons (post hoc)
Planned comparisons (a priori)
Now we need to make ‘pseudo-factors’ that combine our levels of interest

Planned (a priori) contrasts

R INTERLUDE

2x2 Fixed-Effects Factorial ANOVA contrasts & interaction

continuous response and two main effect variables

rnadata <- read.table('RNAseq.tsv', header=T, sep='')
gene <- rnadata$Gene80
microbiota <- rnadata$Microbiota
genotype <- rnadata$Genotype

make new “pseudo factor,” combining genotype and microbiota

gxm <- interaction(genotype,microbiota)
levels(gxm)
boxplot(gene ~ gxm)

specify the following 2 contrasts

contrasts(gxm) <- cbind(c(2, -1, 0, -1), c(-1, -1, 3, -1))

R INTERLUDE

2x2 Fixed-Effects Factorial ANOVA contrasts & interaction

Fit the factorial linear model

rna_aov <- aov(gene ~ gxm)

Examine the ANOVA table, using supplied contrasts. Figure out the appropriate titles to give them.

summary(rna_aov, split = list(gxm = list('xxx'=1,'xxx'=2)))

What does the contrast summary tell you about the nature of the interaction?

Mixed effect models with unequal sample sizes

Attributes of mixed effects models

Linear models that include both fixed and random effects.
The model is split into fixed and random parts:
- Fixed effects influence mean of the response variable Y.
- Random effects influence the variance of Y.
There is a different error variance for each level of grouping.
Estimation and testing is based on restricted maximum likelihood, which can handle unequal sample size.
P-values for fixed effects are conservative when design unbalanced.
Implemented in the nlme & lme4 packages in R.

Assumptions of mixed-effects models

Variation within groups follows a normal distribution with equal variance among groups.
Groups are randomly sampled from “population” of groups.
Group means follow a normal distribution.
Measurements within groups are independent.

Hypotheses for Model 3 ANOVA Factorial Design With Mixed Effects

General R syntax for two factor factorial designs

R INTERLUDE

Variance components with 2 random factors using LME4

rnadata <- read.table('RNAseq.tsv', header=T, sep='')
head(rnadata)

variables excluding first 5 and last 5 observations

gene <- rnadata$Gene80[6:75] 
microbiota <- rnadata$Microbiota[6:75]
genotype <- rnadata$Genotype[6:75]
boxplot(gene ~ microbiota)
boxplot(gene ~ genotype)
boxplot(gene ~ microbiota*genotype)

Estimate the variance components using Restricted Maximum Likelihood (REML)

library(lme4)
lmer(gene ~ 1 + (1 | microbiota) + (1 | genotype) + (1 | microbiota:genotype))

Based on the REML sd estimates, what are the relative contributions of the factors to total variance in gene expression?

Analysis of Covariance (ANCOVA)

Brain & body size

neaderthals as compared to humans

Brain & body size

neaderthals as compared to humans

Brain & body size

neaderthals as compared to humans

ANCOVA

Analysis of covariance - mixture of regression and ANOVA
Response is still a normally distributed continuous variable
One or more continuous predictor variables (covariates)
Sometimes the covariates are of biological interest
Most often we want to remove unexplained variance
In this way they are similar to a blocking variable in ANOVA
Operationally, ANCOVA is regular ANOVA in which the group and overall means are replaced by group and overall relationships

ANCOVA

Adjusting for the covariate

ANCOVA

Adjusting for the covariate

ANCOVA

Linear model with two covariates

ANCOVA

Factor and covariate hypothesis tests

ANCOVA

F ratio tests

ANCOVA

Assumptions

The residuals are normally distributed
The residuals show homoscedasticity of variance
The residuals are independent of one another
The relationship between the response variable and each covariate is linear
Homogeneity of slopes among the groups
Similar covariate ranges among the groups

ANCOVA

Heterogeneous slopes

ANCOVA

Heterogeneous slopes

Problem - adjusting to a mean is difficult or impossible if the slopes are different
In essence, the samples for the groups come from two different populations
A test for homogeneity of slopes can be performed
The assumption is tested by looking for a significant interaction term between the categorical response variables and the covariate(s)

ANCOVA

Non-overlapping range of the covariate

R INTERLUDE

ANCOVA

Impacts of sexual activity on male fruitfly longevity
Data from Partridge and Faraquhar (1981)
Longevity of male measured in response to access to
- no females
- one virgin
- eight virgins
- one mated
- eight mated
The male fruit flies also varied in size
The males were assigned randomly to each of the treatment levels, and then measured thorax length as a covariate

R INTERLUDE

ANCOVA

longevity_data <- read.table('longevity.csv', header=T, sep=',')
head(longevity_data)

Variables

long <- longevity_data$LONGEVITY
treat <- longevity_data$TREATMENT
thorax <- longevity_data$THORAX

check to see if the covariate should be included

boxplot(long ~ treat)
plot(long ~ thorax)

R INTERLUDE

ANCOVA

assess assumptions of normality and homogeneity of variance

plot(aov(long ~ thorax + treat ), which = 1)

†ry it again with a transformed response variable

plot(aov(log10(long) ~ thorax + treat ), which = 1)

visually assess linearity, homogenetiy of slopes and covariate range equality

library(lattice)
print(xyplot(log10(long) ~ thorax | treat, type = c("r", "p")))

R INTERLUDE

ANCOVA

formally test homogenetiy of slopes by testing the interaction term

anova(aov(log10(long) ~ thorax*treat))

formally test covariate range disparity by modeling the effect of the treatments on the covariate

anova(aov(thorax ~ treat))

FINALLY, set up contrasts, fit the additive model and visualize the results (pg. 459 and 460 of your Logan book)
Summarize the trends in a nice plot (pg. 461 of your Logan book)