Random effects, and mixed models

Peter Ralph

27 October – Advanced Biological Statistics

Random effects

An example: urchins eat algae

From Logan:

To investigate density-dependent grazing effects of sea urchin Andrew and Underwood (1993) on filamentous algae measured the percentage of filamentous algae within five quadrats randomly positioned within each of four random patches of reef that were in turn nested within four sea urchin density treatments (no urchins, 33% of natural density, 66% natural density and 100% natural density). The sea urchin density treatment was considered a fixed factor and patch within density treatment as well as the individual quadrats were treated as random factors.

An example: urchins eat algae

andrew_data <- read.table('../Datasets/Logan_data/andrew.tsv', header=T, sep='\t')
head(andrew_data)

##   TREAT PATCH QUAD ALGAE
## 1    0%     1    1    46
## 2    0%     1    2    44
## 3    0%     1    3    41
## 4    0%     1    4    29
## 5    0%     1    5    11
## 6    0%     2    1    65

There are four variables: TREAT, PATCH, QUAD and ALGAE

Main effect factor: TREAT

Both QUAD and PATCH are factors:

andrew_data$QUAD <- factor(andrew_data$QUAD)
andrew_data$PATCH <- factor(andrew_data$PATCH)
andrew_data$TREAT <- factor(andrew_data$TREAT, levels=c("0%", "33%", "66%", "100%"))

Experimental design

with(andrew_data, table(PATCH, QUAD, TREAT))

## , , TREAT = 0%
## 
##      QUAD
## PATCH 1 2 3 4 5
##    1  1 1 1 1 1
##    2  1 1 1 1 1
##    3  1 1 1 1 1
##    4  1 1 1 1 1
##    5  0 0 0 0 0
##    6  0 0 0 0 0
##    7  0 0 0 0 0
##    8  0 0 0 0 0
##    9  0 0 0 0 0
##    10 0 0 0 0 0
##    11 0 0 0 0 0
##    12 0 0 0 0 0
##    13 0 0 0 0 0
##    14 0 0 0 0 0
##    15 0 0 0 0 0
##    16 0 0 0 0 0
## 
## , , TREAT = 33%
## 
##      QUAD
## PATCH 1 2 3 4 5
##    1  0 0 0 0 0
##    2  0 0 0 0 0
##    3  0 0 0 0 0
##    4  0 0 0 0 0
##    5  1 1 1 1 1
##    6  1 1 1 1 1
##    7  1 1 1 1 1
##    8  1 1 1 1 1
##    9  0 0 0 0 0
##    10 0 0 0 0 0
##    11 0 0 0 0 0
##    12 0 0 0 0 0
##    13 0 0 0 0 0
##    14 0 0 0 0 0
##    15 0 0 0 0 0
##    16 0 0 0 0 0
## 
## , , TREAT = 66%
## 
##      QUAD
## PATCH 1 2 3 4 5
##    1  0 0 0 0 0
##    2  0 0 0 0 0
##    3  0 0 0 0 0
##    4  0 0 0 0 0
##    5  0 0 0 0 0
##    6  0 0 0 0 0
##    7  0 0 0 0 0
##    8  0 0 0 0 0
##    9  1 1 1 1 1
##    10 1 1 1 1 1
##    11 1 1 1 1 1
##    12 1 1 1 1 1
##    13 0 0 0 0 0
##    14 0 0 0 0 0
##    15 0 0 0 0 0
##    16 0 0 0 0 0
## 
## , , TREAT = 100%
## 
##      QUAD
## PATCH 1 2 3 4 5
##    1  0 0 0 0 0
##    2  0 0 0 0 0
##    3  0 0 0 0 0
##    4  0 0 0 0 0
##    5  0 0 0 0 0
##    6  0 0 0 0 0
##    7  0 0 0 0 0
##    8  0 0 0 0 0
##    9  0 0 0 0 0
##    10 0 0 0 0 0
##    11 0 0 0 0 0
##    12 0 0 0 0 0
##    13 1 1 1 1 1
##    14 1 1 1 1 1
##    15 1 1 1 1 1
##    16 1 1 1 1 1

Response distribution

plot(ALGAE ~ TREAT, data=andrew_data)
points(ALGAE ~ jitter(as.numeric(TREAT)), data=andrew_data, pch=20, col=1+as.numeric(PATCH)%%4)

plot of chunk r boxit

Why is this wrong?

summary(lm(ALGAE ~ TREAT, data=andrew_data))

## 
## Call:
## lm(formula = ALGAE ~ TREAT, data = andrew_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -39.20 -19.00  -1.30  12.72  57.45 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   39.200      5.152   7.608 6.17e-11 ***
## TREAT33%     -20.200      7.287  -2.772   0.0070 ** 
## TREAT66%     -17.650      7.287  -2.422   0.0178 *  
## TREAT100%    -37.900      7.287  -5.201 1.62e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.04 on 76 degrees of freedom
## Multiple R-squared:  0.2634, Adjusted R-squared:  0.2343 
## F-statistic: 9.059 on 3 and 76 DF,  p-value: 3.362e-05

What we really want: \[ \text{(algae)} = \text{(mean for treatment)} + \text{(mean offset for patch)} + \text{("noise")} . \]

We could do:

ALGAE ~ TREAT + PATCH

… but do we care about all those patch means?

summary(lm(ALGAE ~ TREAT + PATCH, data=andrew_data))

## 
## Call:
## lm(formula = ALGAE ~ TREAT + PATCH, data = andrew_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -36.80  -2.60  -0.50   4.25  42.20 
## 
## Coefficients: (3 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   34.200      7.728   4.426 3.82e-05 ***
## TREAT33%       1.600     10.929   0.146  0.88406    
## TREAT66%     -14.200     10.929  -1.299  0.19850    
## TREAT100%    -31.600     10.929  -2.891  0.00523 ** 
## PATCH2        27.800     10.929   2.544  0.01339 *  
## PATCH3       -32.000     10.929  -2.928  0.00472 ** 
## PATCH4        24.200     10.929   2.214  0.03038 *  
## PATCH5       -33.200     10.929  -3.038  0.00345 ** 
## PATCH6       -35.800     10.929  -3.276  0.00170 ** 
## PATCH7         1.800     10.929   0.165  0.86970    
## PATCH8            NA         NA      NA       NA    
## PATCH9         8.400     10.929   0.769  0.44495    
## PATCH10       16.800     10.929   1.537  0.12917    
## PATCH11      -19.000     10.929  -1.739  0.08693 .  
## PATCH12           NA         NA      NA       NA    
## PATCH13       -1.000     10.929  -0.092  0.92738    
## PATCH14       -2.600     10.929  -0.238  0.81272    
## PATCH15       -1.600     10.929  -0.146  0.88406    
## PATCH16           NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.28 on 64 degrees of freedom
## Multiple R-squared:  0.6512, Adjusted R-squared:  0.5694 
## F-statistic: 7.964 on 15 and 64 DF,  p-value: 1.05e-09

Random effects

Small modification: \[ \text{(algae)} = \text{(mean for treatment)} + \text{(random offset for patch)} + \text{("noise")} . \]

We add a random intercept:

ALGAE ~ TREAT + (1|PATCH)

library(lme4)
alglm <- lmer(ALGAE ~ TREAT + (1|PATCH), data=andrew_data)
summary(alglm)

## Linear mixed model fit by REML ['lmerMod']
## Formula: ALGAE ~ TREAT + (1 | PATCH)
##    Data: andrew_data
## 
## REML criterion at convergence: 682.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9808 -0.3106 -0.1093  0.2831  2.5910 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  PATCH    (Intercept) 294.3    17.16   
##  Residual             298.6    17.28   
## Number of obs: 80, groups:  PATCH, 16
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   39.200      9.408   4.167
## TREAT33%     -20.200     13.305  -1.518
## TREAT66%     -17.650     13.305  -1.327
## TREAT100%    -37.900     13.305  -2.849
## 
## Correlation of Fixed Effects:
##           (Intr) TREAT3 TREAT6
## TREAT33%  -0.707              
## TREAT66%  -0.707  0.500       
## TREAT100% -0.707  0.500  0.500

anova(
      lmer(ALGAE ~ TREAT + (1|PATCH), data=andrew_data),
      lm(ALGAE ~ TREAT, data=andrew_data))

## refitting model(s) with ML (instead of REML)

## Data: andrew_data
## Models:
## lm(ALGAE ~ TREAT, data = andrew_data): ALGAE ~ TREAT
## lmer(ALGAE ~ TREAT + (1 | PATCH), data = andrew_data): ALGAE ~ TREAT + (1 | PATCH)
##                                                       npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## lm(ALGAE ~ TREAT, data = andrew_data)                    5 734.90 746.81 -362.45   724.90                         
## lmer(ALGAE ~ TREAT + (1 | PATCH), data = andrew_data)    6 718.83 733.12 -353.42   706.83 18.069  1   2.13e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(
      lmer(ALGAE ~ TREAT + (1|PATCH), data=andrew_data),
      lmer(ALGAE ~ (1|PATCH), data=andrew_data))

## refitting model(s) with ML (instead of REML)

## Data: andrew_data
## Models:
## lmer(ALGAE ~ (1 | PATCH), data = andrew_data): ALGAE ~ (1 | PATCH)
## lmer(ALGAE ~ TREAT + (1 | PATCH), data = andrew_data): ALGAE ~ TREAT + (1 | PATCH)
##                                                       npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## lmer(ALGAE ~ (1 | PATCH), data = andrew_data)            3 721.12 728.27 -357.56   715.12                       
## lmer(ALGAE ~ TREAT + (1 | PATCH), data = andrew_data)    6 718.83 733.12 -353.42   706.83 8.2938  3    0.04031 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What are the random effects?

ranef(alglm)

## $PATCH
##    (Intercept)
## 1   -4.1565746
## 2   18.9539802
## 3  -30.7586522
## 4   15.9612465
## 5  -13.6335647
## 6  -15.7949835
## 7   15.4624576
## 8   13.9660907
## 9    5.6945072
## 10  12.6775526
## 11 -17.0835217
## 12  -1.2885381
## 13   0.2493945
## 14  -1.0807094
## 15  -0.2493945
## 16   1.0807094
## 
## with conditional variances for "PATCH"

rfs <- ranef(alglm)$PATCH
ses <- rfs[,1] + outer(sqrt(as.vector(attr(rfs, "postVar"))), c(-2, 2), "*")
plot(rfs[,1], 1:nrow(rfs), xlab='patch mean', xlim=range(ses), ylab='')
segments(x0=ses[,1], x1=ses[,2], y0=1:nrow(rfs))
abline(v=0, col='red')

plot of chunk r plot_ranef

Notes on mixed models

The math is a lot harder.

For a simple linear model fit by lm( ) (with fixed effects), the log-likelihood function is just the sum of the squared residuals.

But with a mixed model, the likelihood averages over the values of the random effects, which makes everything more difficult.

You sometimes have to worry about convergence.

Since the math is harder, mixed-model-fitting functions like lmer( ) have to use various sorts of numerical optimization methods to find the best-fitting parameters.

Sometimes, these may fail.

Notably, many use the REML approximation:

Usage:

     lmer(formula, data = NULL, REML = TRUE, control = lmerControl(),
          start = NULL, verbose = 0L, subset, weights, na.action,
          offset, contrasts = NULL, devFunOnly = FALSE, ...)

Hypothesis testing?

With fixed effects, for a factor f, the comparison

anova( lm(y ~ f - 1), lm(y ~ 1) )

uses the model that \[ y_i = \beta_{f_i} + \epsilon_i \] to test against the null hypothesis that \[ H_0 : \beta_1 = \beta_2 = \cdots = \beta_m = 0. \]

With random effects,

anova( lm(y ~ (1|f) - 1), lm(y ~ 1) )

uses the model that \[\begin{aligned} y_i &= \beta_{f_i} + \epsilon_i \\ \beta_a &\sim \Normal(0, \eta) \end{aligned}\] to test against the null hypothesis that \[ H_0 : \eta = 0. \]

Back to the pumpkins

Your turn

Add a random effect of plot to the model.
How big is the “plot” effect?
Assess significance by using anova( ) to compare to a nested model.

Data: data/pumpkins.tsv

pumpkins <- read.table("data/pumpkins.tsv", header=TRUE)
pumpkins$plot <- factor(pumpkins$plot)
pumpkins$fertilizer <- factor(pumpkins$fertilizer)
pumpkins$water <- factor(pumpkins$water)

IN CLASS

(1) Add a random effect to the linear model.

library(lme4)
fixed_fit <- lm( weight ~ water * fertilizer, data=pumpkins)
mixed_fit <- lmer( weight ~ water * fertilizer + (1|plot:water:fertilizer), data=pumpkins)
summary(mixed_fit)

## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
##    Data: pumpkins
## 
## REML criterion at convergence: 369.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.64930 -0.54801  0.03808  0.70427  1.69905 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  plot:water:fertilizer (Intercept) 0.06128  0.2476  
##  Residual                          1.23871  1.1130  
## Number of obs: 120, groups:  plot:water:fertilizer, 24
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                  20.00462    0.27795  71.972
## waterwater                    0.02462    0.39308   0.063
## fertilizerlow               -12.08324    0.39308 -30.740
## fertilizermedium             -6.13599    0.39308 -15.610
## waterwater:fertilizerlow     -1.84651    0.55590  -3.322
## waterwater:fertilizermedium   2.67808    0.55590   4.818
## 
## Correlation of Fixed Effects:
##                (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater     -0.707                                      
## fertilizrlw    -0.707  0.500                               
## fertilzrmdm    -0.707  0.500  0.500                        
## wtrwtr:frtlzrl  0.500 -0.707 -0.707  -0.354                
## wtrwtr:frtlzrm  0.500 -0.707 -0.354  -0.707   0.500

(2) How big is the “plot” effect? The estimated standard deviation of the plot effects is 0.2475554. We can also look at the estimated random effects visually, by plotting the estimated mean plot effects, with a bar showing plus/minus 2 standard errors.

rfs <- ranef(mixed_fit)[["plot:water:fertilizer"]]
ses <- rfs[,1] + outer(sqrt(as.vector(attr(rfs, "postVar"))), c(-2, 2), "*")
plot(rfs[,1], 1:nrow(rfs), xlab='plot effect mean',
     xlim=range(ses), ylab='')
segments(x0=ses[,1], x1=ses[,2], y0=1:nrow(rfs))
abline(v=0, col='red')

plot of chunk r plot_effect

Overall, our best guess is that plots differ by around 0.2 pounds, but no more than 0.6 pounds from each other (in terms of average pumpkin weight, with 95% confidence).

(3) Compare the models with/without random effects, to see if the random effects seem to have added anything useful.

anova(mixed_fit, fixed_fit)

## refitting model(s) with ML (instead of REML)

## Data: pumpkins
## Models:
## fixed_fit: weight ~ water * fertilizer
## mixed_fit: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
##           npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)
## fixed_fit    7 378.68 398.19 -182.34   364.68                    
## mixed_fit    8 380.68 402.98 -182.34   364.68     0  1          1

Hm, a \(p\)-value of 1. That’s suspicious. We know there shouldn’t be an effect of plot, but are things really working right?

See what happens if there is a plot effect

sim_pumpkins <- function (plot_sd=0) {
    resid_sd <- 1.2
    plants_per_plot <- 5
    plots_per_treatment <- 4
    pumpkins <- expand.grid(
                  fertilizer=c("low", "medium", "high"),
                  water=c("no water", "water"),
                  plot=1:plots_per_treatment,
                  plant=1:plants_per_plot,
                  weight=NA)
    # true values
    mu <- 20
    alpha <- c('high'=0, 'medium'=-6, 'low'=-12)
    beta <- c('no water'=0, 'water'=0)
    gamma <- c('high.no water'=0,
               'high.water'=0,
               'medium.no water'=0,
               'medium.water'=2,
               'low.no water'=0,
               'low.water'=-2)
    plot_delta <- rnorm(plots_per_treatment
                        * nlevels(pumpkins$fertilizer)
                        * nlevels(pumpkins$water),
                        mean=0, sd=plot_sd)
    k <- 1
    for (p in 1:plots_per_treatment) {
        for (f in levels(pumpkins$fertilizer)) {
            for (w in levels(pumpkins$water)) {
                names(plot_delta)[k] <- paste(p, f, w, sep='.')
                k <- k + 1
            }
        }
    }
    pumpkins$mean_weight <- (mu
        + alpha[as.character(pumpkins$fertilizer)]
        + beta[as.character(pumpkins$water)]
        + gamma[paste(pumpkins$fertilizer, pumpkins$water, sep='.')]
        + plot_delta[paste(pumpkins$plot, pumpkins$fertilizer, pumpkins$water, sep='.')])
    pumpkins$weight <- rnorm(nrow(pumpkins),
                             mean=pumpkins$mean_weight,
                             sd=resid_sd)
    return(pumpkins)
}

new_pumpkins <- sim_pumpkins(plot_sd=3)
new_fixed_fit <- lm(
          weight ~ water * fertilizer, data=new_pumpkins)

new_mixed_fit <- lmer(
          weight ~ water * fertilizer + (1|plot:water:fertilizer),
          data=new_pumpkins)

anova(new_mixed_fit, new_fixed_fit)

## refitting model(s) with ML (instead of REML)

## Data: new_pumpkins
## Models:
## new_fixed_fit: weight ~ water * fertilizer
## new_mixed_fit: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
##               npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## new_fixed_fit    7 584.57 604.08 -285.29   570.57                         
## new_mixed_fit    8 466.63 488.93 -225.32   450.63 119.94  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes, everything seems to be working right!

In conclusion, there’s no good evidence that we need a random effect for plots (i.e., that plots are inducing unmodeled correlation in our data). But it’s not hurting our model to include them, either.