10/30/2018 & 11/01/2018

Goals for this week

  • One factor ANOVA
  • Git and GitHub
  • Means tests in ANOVA
  • Experimental Design
  • Power analyses
  • Multi-factor ANOVA

ANOVA

ANOVA

  • Stands for ANalysis of VAriance
  • Core statistical procedure in biology
  • Developed by R.A. Fisher in the early 20th Century
  • The core idea is to ask how much variation exists within vs. among groups
  • ANOVAs are linear models that have categorical predictor and continuous response variables
  • The categorical predictors are often called factors, and can have two or more levels (important to specify in R)
  • Each factor will have a hypothesis test
  • The levels of each factor may also need to be tested

ANOVA

Let’s start with an example

  • Percent time that male mice experiencing discomfort spent “stretching”.
  • Data are from an experiment in which mice experiencing mild discomfort (result of injection of 0.9% acetic acid into the abdomen) were kept in:
    • isolation
    • with a companion mouse not injected or
    • with a companion mouse also injected and exhibiting “stretching” behaviors associated with discomfort
  • The results suggest that mice stretch the most when a companion mouse is also experiencing mild discomfort. Mice experiencing pain appear to “empathize” with co-housed mice also in pain.

From Langford, D. J.,et al. 2006. Science 312: 1967-1970

ANOVA

Let’s start with an example

In words:

stretching = intercept + treatment






- The model statement includes a response variable, a constant, and an explanatory variable.
- The only difference with regression is that here the explanatory variable is categorical.

ANOVA

Let’s start with an example

ANOVA

ANOVA

Conceptually similar to regression

ANOVA

Statistical results table

ANOVA

F-ratio calculation

ANOVA

F-ratio calculation

R INTERLUDE

One way ANOVA

  • Again, use the RNAseq_lip.tsv data again.
  • Let’s test for an effect of Population on Gene01 expression levels
  • First, let’s look at how the data are distributed
RNAseq_Data <- read.table('RNAseq_lip.tsv', header=T, sep='\t')
g1 <- RNAseq_Data$Gene01
Pop <- RNAseq_Data$Population
boxplot(g1~Pop, col=c("blue","green"))

Or, to plot all points:

stripchart(g1~Pop, vertical=T, pch=19, col=c("blue","green"), 
           at=c(1.25,1.75), method="jitter", jitter=0.05)
Pop_Anova <- aov(g1 ~ Pop)
summary(Pop_Anova)

R INTERLUDE

One way ANOVA

ANOVA

One or more predictor variables

  • One-way ANOVAs just have a single factor
  • Multi-factor ANOVAs
    • Factorial - two or more factors and their interactions
    • Nested - the levels of one factor are contained within another level
    • The models can be quite complex
  • ANOVAs use an F-statistic to test factors in a model
    • Ratio of two variances (numerator and denominator)
    • The numerator and denominator d.f. need to be included (e.g. \(F_{1, 34} = 29.43\))
  • Determining the appropriate test ratios for complex ANOVAs takes some work

ANOVA

Assumptions

  • Normally distributed groups
    • robust to non-normality if equal variances and sample sizes
  • Equal variances across groups
    • okay if largest-to-smallest variance ratio < 3:1
    • problematic if there is a mean-variance relationship among groups
  • Observations in a group are independent
    • randomly selected
    • don’t confound group with another factor

Different ways to include factors in models

ANOVA

Fixed effects of factors

  • Groups are predetermined, of direct interest, repeatable.
  • For example:
    • medical treatments in a clinical trial
    • predetermined doses of a toxin
    • age groups in a population
    • habitat, season, etc.
  • Any conclusions reached in the study about differences among groups can be applied only to the groups included in the study.
  • The results cannot be generalized to other treatments, habitats, etc. not included in the study.

ANOVA

Random effects of factors

  • Measurements that come in groups. A group can be:
    • a family made up of siblings
    • a subject measured repeatedly
    • a transect of quadrats in a sampling survey
    • a block of an experiment done at a given time
  • Groups are assumed to be randomly sampled from a population of groups.
  • Therefore, conclusions reached about groups can be generalized to the population of groups.
  • With random effects, the variance among groups is the main quantity of interest, not the specific group attributes.

ANOVA

Random effects of factors

  • Below are cases where you are likely to treat factors as random effects
  • Whenever your sampling design is nested
    • quadrats within transects
    • transects within woodlots
    • woodlots within districts
  • Whenever you divide up plots and apply separate treatments to subplots
  • Whenever your replicates are grouped spatially or temporally
    • in blocks
    • in batches
  • Whenever you take measurements on related individuals
  • Whenever you measure subjects or other sampling units repeatedly

ANOVA

Random effects of factors

ANOVA

Random effects - test your understanding

  • Factor is sex (Male vs. Female)
  • Factor is fish tank (10 tanks in an experiment)
  • Factor is family (measure multiple sibs per family)
  • Factor is temperature (10 arbitrary temps over natural range)

ANOVA

Caution about fixed vs. random effects

  • Using fixed vs. random effects changes the way that statistical tests are performed in ANOVA
  • Most statistical packages assume that all factors are fixed unless you instruct it otherwise
  • Designating factors as random takes extra work and probably a read of the manual
  • In R, lm assumes that all effects are fixed
  • For random effects, use lme instead (part of the nlme package)

Git and GitHub

Git and GitHub

Clone the repository

  • First make a new directory into which you will clone our course repository
  • Open the terminal and navigate to the directory and type the following
git clone https://github.com/wcresko/UO_ABS.git
  • Now to update the repository you just need to use these commands
git status

git merge origin/master
  • The first command just tells you if anything has changed
  • If so, do the second!

Means test to compare levels of a factor

Means for greater than two factor levels?

  • The F-ratio test for a single-factor ANOVA tests for any difference among groups.
  • If we want to understand specific differences, we need further “contrasts”.
  • Unplanned comparisons (post hoc):
    • Multiple comparisons carried out after the results are obtained.
    • Used to find where the differences lie (which means differ from which other means)
    • Comparisons require protection for inflated Type 1 error rates:
      • Tukey tests: compare all pairs of means and control for multiple comparisons
      • Scheffé contrasts: compare all combinations of means
  • Planned comparisons (a priori):
    • Comparisons between group means that were decided when the experiment was designed (not after the data were in)
    • Must be few in number to avoid inflating Type 1 error rates

Planned (a priori) contrasts

  • A well planned experiment often dictates which comparison of means are of most interest, whereas other comparisons are of no interest.
  • By restricting the comparisons to just the ones of interest, researchers can mitigate the multiple testing problem associated with post-hoc tests.
  • Some statisticians argue that, in fact, planned comparisons allow researchers to avoid adjusting p-values all together because each test is therefore unique.
  • Contrasts can also allow more complicated tests of the relationships among means.
  • Coding a priori contrasts in R is quite easy and just depends upon writing the right series of coefficient contrasts.

Planned (a priori) contrasts

Understand the coefficients table

R INTERLUDE

Planned contrasts

  • Take the RNAseq data you’ve examined before and create a new four level genotype by combining genotype and microbiota treatment into a single variable
  • Think about how to do this using dplyr functions.
RNAseq_Data <- read.table("RNAseq.tsv", header=T, sep='')

x <- RNAseq_Data$categorical_var
y <- RNAseq_Data$continuous_var1
z <- RNAseq_Data$continuous_var2
  • Set up the a priori contrasts specifically testing one group mean against another
  • These are just examples - you should figure out the logic of the contrasts
contrasts(x) <- cbind(c(0, 1, 0, -1), c(2, -1, 0, -1), c(-1, -1, 3, -1))
  • Confirm that the contrasts are orthogonal
round(crossprod(contrasts(x)), 2)

R INTERLUDE

Planned contrasts

  • Define the contrast labels
rnaseq_data_list <- list(x = list(‘xxx vs. xxx’ = 1, ‘xxx vs. xxx’ = 2, ‘xxx vs. xxx’ = 3))
  • Then fit the fixed effect model
RNAseq_aov_fixed <- aov(y ~ x)
plot(RNAseq_aov_fixed)
boxplot(y ~ x)
summary(RNAseq_aov_fixed, split = rnaseq_data_list)

R INTERLUDE

Unplanned contrasts

  • Remember that this is when you had no hypotheses of differences in means in advance
  • Read in the perchlorate data from Week 3
  • Let’s assess the effects of the 4 perchlorate levels on T4
  • Which perchlorate levels differ in their effect on T4?
perc <- read.table('perchlorate_data.tsv', header=T, sep='\t')

x <- perc$Perchlorate_Level
y <- log10(perc$T4_Hormone_Level)

MyANOVA <- aov(y ~ x)
summary (MyANOVA)
boxplot(y ~ x)

install.packages("multcomp")
library(multcomp)

summary(glht(MyANOVA, linfct = mcp(x = "Tukey")))