Linear models

Peter Ralph

22 October – Advanced Biological Statistics

Outline

Linear models…

with categorical variables (multivariate ANOVA)
with continuous variables (least-squares regression)
and likelihood (where’s “least-squares” come from).

Multivariate ANOVA

The factorial ANOVA model

Say we have $n$ observations coming from combinations of two factors, so that the $k$th observation in the $i$th group of factor $A$ and the $j$th group of factor $B$ is \[\begin{equation} X_{ijk} = \mu + \alpha_i + \beta_j + \gamma_{ij} + \epsilon_{ijk} , \end{equation}\] where

$\mu$: overall mean
$\alpha_i$: mean deviation of group $i$ of factor A from $\mu$ and average of B,
$\beta_j$: mean deviation of group $j$ of factor B from $\mu$ and average of A,
$\gamma_{ij}$: mean deviation of combination $i + j$ from $\mu + \alpha_i + \beta_j$, and
$\epsilon_{ijk}$: what’s left over (“error”, or “residual”)

In words, \[\begin{equation} \begin{split} \text{(value)} &= \text{(overall mean)} + \text{(A group mean)} \\ &\qquad {} + \text{(B group mean)} + \text{(AB group mean)} + \text{(residual)} \end{split}\end{equation}\]

Example: pumpkin pie

We’re looking at how mean pumpkin weight depends on both

fertilizer input, and
late-season watering

So, we

divide a large field into many plots
randomly assign plots to either “high”, “medium”, or “low” fertilizer, and
independently, assign plots to either “no late water” or “late water”; then
plant a fixed number of plants per plot,
grow pumpkins and measure their weight.

Questions:

How does mean weight depend on fertilizer?
… or, on late-season water?
Does the effect of fertilizer depend on late-season water?
How much does mean weight differ between different plants in the same conditions?
… and, between plots of the same conditions?
How much does weight of different pumpkins on the same plant differ?

draw the pictures

First, a simplification

Rightarrow Ignore any “plant” and “plot” effects.

(e.g., only one pumpkin per vine and one plot per combination of conditions)

Say that $i=1, 2, 3$ indexes fertilizer levels (low to high), and $j=1, 2$ indexes late watering (no or yes), and \[\begin{equation}\begin{split} X_{ijk} &= \text{(weight of $k$th pumpkin in plot with conditions $i$, $j$)} \\ &= \mu + \alpha_i + \beta_j + \gamma_{ij} + \epsilon_{ijk} , \end{split}\end{equation}\] where

$\mu$:
$\alpha_i$:
$\beta_j$:
$\gamma_{ij}$:
$\epsilon_{ijk}$:

Making it real with simulation

A good way to get a better concrete understanding of something is by simulating it –

by writing code to generate a (random) dataset that you design to look, more or less like what you expect the real data to look like.

This lets you explore statistical power, choose sample sizes, etcetera… but also makes you realize things you hadn’t, previously.

First, make up some numbers

$\mu$:
$\alpha_i$:
$\beta_j$:
$\gamma_{ij}$:
$\epsilon_{ijk}$:

Next, a data format

head( expand.grid(
          fertilizer=c("low", "medium", "high"),
          water=c("no water", "water"),
          plot=1:4,
          plant=1:5,
          weight=NA))

##   fertilizer    water plot plant weight
## 1        low no water    1     1     NA
## 2     medium no water    1     1     NA
## 3       high no water    1     1     NA
## 4        low    water    1     1     NA
## 5     medium    water    1     1     NA
## 6       high    water    1     1     NA

Exercise: simulation

IN CLASS:

# true values for means
mu <- 10  # pounds
alpha <- c(high=5, medium=0, low=-5)
beta <- c("no water"=-3, water=3)
gamma <- c("low.no water" = 0,
           "medium.no water" = 0,
           "high.no water" = 0,
           "low.water" = 0,
           "medium.water" = -3,
           "high.water" = -6)
sigma <- 0.5

pumpkins <- expand.grid(
              fertilizer=c("low", "medium", "high"),
              water=c("no water", "water"),
              plot=1:4,
              plant=1:5,
              weight=NA)

pumpkins$mean_weight <- (mu + alpha[as.character(pumpkins$fertilizer)]
                         + beta[as.character(pumpkins$water)]
                         + gamma[paste(as.character(pumpkins$fertilizer),
                                       as.character(pumpkins$water), sep=".")])

pumpkins$weight <- rnorm(nrow(pumpkins),
                         mean=pumpkins$mean_weight,
                         sd=sigma)

write.table(pumpkins, file="data/pumpkins.tsv")

IN CLASS:

par(mar=c(4, 8, 1, 1)+.1)
boxplot(weight ~ water * fertilizer, data=pumpkins, horizontal=TRUE, las=2, ylab='')

plot of chunk check_sim

The simulated dataset is available at data/pumpkins.tsv.

Questions that (linear models and) ANOVA can answer

What are (estimates of) the coefficents?

summary(lm(weight ~ fertilizer + water, data=pumpkins))

## 
## Call:
## lm(formula = weight ~ fertilizer + water, data = pumpkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.77906 -1.15970 -0.01605  1.17450  2.32947 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3.4415     0.2455   14.02   <2e-16 ***
## fertilizermedium   3.5873     0.3007   11.93   <2e-16 ***
## fertilizerhigh     7.0027     0.3007   23.29   <2e-16 ***
## waterwater         3.1258     0.2455   12.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.345 on 116 degrees of freedom
## Multiple R-squared:  0.8586, Adjusted R-squared:  0.855 
## F-statistic: 234.9 on 3 and 116 DF,  p-value: < 2.2e-16

Do different fertilizer levels differ? water?

summary(aov(weight ~ fertilizer + water, data=pumpkins))

##              Df Sum Sq Mean Sq F value Pr(>F)    
## fertilizer    2  981.0   490.5   271.3 <2e-16 ***
## water         1  293.1   293.1   162.1 <2e-16 ***
## Residuals   116  209.8     1.8                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Or equivalently,

anova(lm(weight ~ fertilizer + water, data=pumpkins))

## Analysis of Variance Table
## 
## Response: weight
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## fertilizer   2 980.95  490.48  271.25 < 2.2e-16 ***
## water        1 293.11  293.11  162.10 < 2.2e-16 ***
## Residuals  116 209.75    1.81                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What are all those numbers?

What do they mean?

Which levels are different from which other ones?

John Tukey has a method for that.

> ?TukeyHSD

TukeyHSD                 package:stats                 R Documentation

Compute Tukey Honest Significant Differences

Description:

     Create a set of confidence intervals on the differences between
     the means of the levels of a factor with the specified family-wise
     probability of coverage.  The intervals are based on the
     Studentized range statistic, Tukey's ‘Honest Significant
     Difference’ method.

...

     When comparing the means for the levels of a factor in an analysis
     of variance, a simple comparison using t-tests will inflate the
     probability of declaring a significant difference when it is not
     in fact present.  This because the intervals are calculated with a
     given coverage probability for each interval but the
     interpretation of the coverage is usually with respect to the
     entire family of intervals.

Example

TukeyHSD(aov(weight ~ fertilizer + water, data=pumpkins))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ fertilizer + water, data = pumpkins)
## 
## $fertilizer
##                 diff      lwr      upr p adj
## medium-low  3.587309 2.873438 4.301180     0
## high-low    7.002706 6.288835 7.716576     0
## high-medium 3.415396 2.701525 4.129267     0
## 
## $water
##                    diff      lwr      upr p adj
## water-no water 3.125756 2.639501 3.612011     0

Does the effect of fertilizer depend on water?

summary(aov(weight ~ fertilizer * water, data=pumpkins))

##                   Df Sum Sq Mean Sq F value Pr(>F)    
## fertilizer         2  981.0   490.5  2195.0 <2e-16 ***
## water              1  293.1   293.1  1311.7 <2e-16 ***
## fertilizer:water   2  184.3    92.1   412.3 <2e-16 ***
## Residuals        114   25.5     0.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(lm(weight ~ fertilizer * water, data=pumpkins))

## 
## Call:
## lm(formula = weight ~ fertilizer * water, data = pumpkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.23696 -0.26951  0.00257  0.32312  0.85524 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   1.9672     0.1057   18.61   <2e-16 ***
## fertilizermedium              4.9780     0.1495   33.30   <2e-16 ***
## fertilizerhigh               10.0347     0.1495   67.13   <2e-16 ***
## waterwater                    6.0742     0.1495   40.63   <2e-16 ***
## fertilizermedium:waterwater  -2.7814     0.2114  -13.16   <2e-16 ***
## fertilizerhigh:waterwater    -6.0640     0.2114  -28.68   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4727 on 114 degrees of freedom
## Multiple R-squared:  0.9828, Adjusted R-squared:  0.9821 
## F-statistic:  1305 on 5 and 114 DF,  p-value: < 2.2e-16

Or equivalently,

anova(lm(weight ~ fertilizer * water, data=pumpkins))

## Analysis of Variance Table
## 
## Response: weight
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## fertilizer         2 980.95  490.48 2194.98 < 2.2e-16 ***
## water              1 293.11  293.11 1311.73 < 2.2e-16 ***
## fertilizer:water   2 184.28   92.14  412.34 < 2.2e-16 ***
## Residuals        114  25.47    0.22                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model comparison with ANOVA

The idea

Me: Hey, I made our model more complicated, and look, it fits better!

You: Yeah, of course it does. How much better?

Me: How can we tell?

You: Well, does it reduce the residual variance more than you’d expect by chance?

The $F$ statistic

To compare two models, \[\begin{aligned} F &= \frac{\text{(explained variance)}}{\text{(residual variance)}} \\ &= \frac{\text{(mean square model)}}{\text{(mean square residual)}} \\ &= \frac{\frac{\text{RSS}_1 - \text{RSS}_2}{p_2-p_1}}{\frac{\text{RSS}_2}{n-p_2}} \end{aligned}\]

Nested model analysis

anova(
      lm(weight ~ water, data=pumpkins),
      lm(weight ~ fertilizer + water, data=pumpkins),
      lm(weight ~ fertilizer * water, data=pumpkins)
)

## Analysis of Variance Table
## 
## Model 1: weight ~ water
## Model 2: weight ~ fertilizer + water
## Model 3: weight ~ fertilizer * water
##   Res.Df     RSS Df Sum of Sq       F    Pr(>F)    
## 1    118 1190.70                                   
## 2    116  209.75  2    980.95 2194.98 < 2.2e-16 ***
## 3    114   25.47  2    184.28  412.34 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Your turn

Do a stepwise model comparison of nested linear models, including plant and plot in the analysis. Think about what order to do the comparison in. Make sure they are nested!

Data: data/pumpkins.tsv

Formulas

Anatomy of a formula

  weight ~ fertilizer + water

is read something like

mean weight is determined by additive effects of fertlizer and of water

\[\begin{equation}\begin{split} y &\sim x \qquad \text{means} \\ y &= a + b x + \text{(mean-zero noise)} . \end{split}\end{equation}\]

Intercepts

The intercept is included implicitly, so these are equivalent:

  weight ~ fertilizer + water
  weight ~ 1 + fertilizer + water

… so if you don’t want an intercept, do

  weight ~ 0 + fertilizer + water

Interactions

To assign an effect to each element of a crossed design, use :, e.g.

  weight ~ fertilizer + water + fertilizer:water

which is the same as

  weight ~ fertilizer * water

since lower-order effects are included implicitly.

A translation table

~ : depends on
+ : and also, independently
: : in combination with
* : and also
I(x+y) : actually x plus y
I(x^2) : actually x squared

Trickier things:

1 : an intercept
0 : but not an intercept
- : but not
. : all columns not otherwise in the formula
x/y : x, and y nested within x (same as x + x:y)

The secret to formulas

If you want to know what a formula is really doing, look at its model.matrix( ), whose columns correspond to the coefficients of the resulting model.

model.matrix(~ fertilizer, data=pumpkins)

##     (Intercept) fertilizermedium fertilizerhigh
## 1             1                0              0
## 2             1                1              0
## 3             1                0              1
## 4             1                0              0
## 5             1                1              0
## 6             1                0              1
## 7             1                0              0
## 8             1                1              0
## 9             1                0              1
## 10            1                0              0
## 11            1                1              0
## 12            1                0              1
## 13            1                0              0
## 14            1                1              0
## 15            1                0              1
## 16            1                0              0
## 17            1                1              0
## 18            1                0              1
## 19            1                0              0
## 20            1                1              0
## 21            1                0              1
## 22            1                0              0
## 23            1                1              0
## 24            1                0              1
## 25            1                0              0
## 26            1                1              0
## 27            1                0              1
## 28            1                0              0
## 29            1                1              0
## 30            1                0              1
## 31            1                0              0
## 32            1                1              0
## 33            1                0              1
## 34            1                0              0
## 35            1                1              0
## 36            1                0              1
## 37            1                0              0
## 38            1                1              0
## 39            1                0              1
## 40            1                0              0
## 41            1                1              0
## 42            1                0              1
## 43            1                0              0
## 44            1                1              0
## 45            1                0              1
## 46            1                0              0
## 47            1                1              0
## 48            1                0              1
## 49            1                0              0
## 50            1                1              0
## 51            1                0              1
## 52            1                0              0
## 53            1                1              0
## 54            1                0              1
## 55            1                0              0
## 56            1                1              0
## 57            1                0              1
## 58            1                0              0
## 59            1                1              0
## 60            1                0              1
## 61            1                0              0
## 62            1                1              0
## 63            1                0              1
## 64            1                0              0
## 65            1                1              0
## 66            1                0              1
## 67            1                0              0
## 68            1                1              0
## 69            1                0              1
## 70            1                0              0
## 71            1                1              0
## 72            1                0              1
## 73            1                0              0
## 74            1                1              0
## 75            1                0              1
## 76            1                0              0
## 77            1                1              0
## 78            1                0              1
## 79            1                0              0
## 80            1                1              0
## 81            1                0              1
## 82            1                0              0
## 83            1                1              0
## 84            1                0              1
## 85            1                0              0
## 86            1                1              0
## 87            1                0              1
## 88            1                0              0
## 89            1                1              0
## 90            1                0              1
## 91            1                0              0
## 92            1                1              0
## 93            1                0              1
## 94            1                0              0
## 95            1                1              0
## 96            1                0              1
## 97            1                0              0
## 98            1                1              0
## 99            1                0              1
## 100           1                0              0
## 101           1                1              0
## 102           1                0              1
## 103           1                0              0
## 104           1                1              0
## 105           1                0              1
## 106           1                0              0
## 107           1                1              0
## 108           1                0              1
## 109           1                0              0
## 110           1                1              0
## 111           1                0              1
## 112           1                0              0
## 113           1                1              0
## 114           1                0              1
## 115           1                0              0
## 116           1                1              0
## 117           1                0              1
## 118           1                0              0
## 119           1                1              0
## 120           1                0              1
## attr(,"assign")
## [1] 0 1 1
## attr(,"contrasts")
## attr(,"contrasts")$fertilizer
## [1] "contr.treatment"

matplot(
    model.matrix(~ fertilizer, data=pumpkins),
    type='l')

plot of chunk plot_model_matrix

model.matrix(~ fertilizer + water, data=pumpkins)

##     (Intercept) fertilizermedium fertilizerhigh waterwater
## 1             1                0              0          0
## 2             1                1              0          0
## 3             1                0              1          0
## 4             1                0              0          1
## 5             1                1              0          1
## 6             1                0              1          1
## 7             1                0              0          0
## 8             1                1              0          0
## 9             1                0              1          0
## 10            1                0              0          1
## 11            1                1              0          1
## 12            1                0              1          1
## 13            1                0              0          0
## 14            1                1              0          0
## 15            1                0              1          0
## 16            1                0              0          1
## 17            1                1              0          1
## 18            1                0              1          1
## 19            1                0              0          0
## 20            1                1              0          0
## 21            1                0              1          0
## 22            1                0              0          1
## 23            1                1              0          1
## 24            1                0              1          1
## 25            1                0              0          0
## 26            1                1              0          0
## 27            1                0              1          0
## 28            1                0              0          1
## 29            1                1              0          1
## 30            1                0              1          1
## 31            1                0              0          0
## 32            1                1              0          0
## 33            1                0              1          0
## 34            1                0              0          1
## 35            1                1              0          1
## 36            1                0              1          1
## 37            1                0              0          0
## 38            1                1              0          0
## 39            1                0              1          0
## 40            1                0              0          1
## 41            1                1              0          1
## 42            1                0              1          1
## 43            1                0              0          0
## 44            1                1              0          0
## 45            1                0              1          0
## 46            1                0              0          1
## 47            1                1              0          1
## 48            1                0              1          1
## 49            1                0              0          0
## 50            1                1              0          0
## 51            1                0              1          0
## 52            1                0              0          1
## 53            1                1              0          1
## 54            1                0              1          1
## 55            1                0              0          0
## 56            1                1              0          0
## 57            1                0              1          0
## 58            1                0              0          1
## 59            1                1              0          1
## 60            1                0              1          1
## 61            1                0              0          0
## 62            1                1              0          0
## 63            1                0              1          0
## 64            1                0              0          1
## 65            1                1              0          1
## 66            1                0              1          1
## 67            1                0              0          0
## 68            1                1              0          0
## 69            1                0              1          0
## 70            1                0              0          1
## 71            1                1              0          1
## 72            1                0              1          1
## 73            1                0              0          0
## 74            1                1              0          0
## 75            1                0              1          0
## 76            1                0              0          1
## 77            1                1              0          1
## 78            1                0              1          1
## 79            1                0              0          0
## 80            1                1              0          0
## 81            1                0              1          0
## 82            1                0              0          1
## 83            1                1              0          1
## 84            1                0              1          1
## 85            1                0              0          0
## 86            1                1              0          0
## 87            1                0              1          0
## 88            1                0              0          1
## 89            1                1              0          1
## 90            1                0              1          1
## 91            1                0              0          0
## 92            1                1              0          0
## 93            1                0              1          0
## 94            1                0              0          1
## 95            1                1              0          1
## 96            1                0              1          1
## 97            1                0              0          0
## 98            1                1              0          0
## 99            1                0              1          0
## 100           1                0              0          1
## 101           1                1              0          1
## 102           1                0              1          1
## 103           1                0              0          0
## 104           1                1              0          0
## 105           1                0              1          0
## 106           1                0              0          1
## 107           1                1              0          1
## 108           1                0              1          1
## 109           1                0              0          0
## 110           1                1              0          0
## 111           1                0              1          0
## 112           1                0              0          1
## 113           1                1              0          1
## 114           1                0              1          1
## 115           1                0              0          0
## 116           1                1              0          0
## 117           1                0              1          0
## 118           1                0              0          1
## 119           1                1              0          1
## 120           1                0              1          1
## attr(,"assign")
## [1] 0 1 1 2
## attr(,"contrasts")
## attr(,"contrasts")$fertilizer
## [1] "contr.treatment"
## 
## attr(,"contrasts")$water
## [1] "contr.treatment"

matplot(
    model.matrix(~ fertilizer + water, data=pumpkins),
    type='l')

model.matrix(~ 0 + fertilizer * water, data=pumpkins)

##     fertilizerlow fertilizermedium fertilizerhigh waterwater fertilizermedium:waterwater fertilizerhigh:waterwater
## 1               1                0              0          0                           0                         0
## 2               0                1              0          0                           0                         0
## 3               0                0              1          0                           0                         0
## 4               1                0              0          1                           0                         0
## 5               0                1              0          1                           1                         0
## 6               0                0              1          1                           0                         1
## 7               1                0              0          0                           0                         0
## 8               0                1              0          0                           0                         0
## 9               0                0              1          0                           0                         0
## 10              1                0              0          1                           0                         0
## 11              0                1              0          1                           1                         0
## 12              0                0              1          1                           0                         1
## 13              1                0              0          0                           0                         0
## 14              0                1              0          0                           0                         0
## 15              0                0              1          0                           0                         0
## 16              1                0              0          1                           0                         0
## 17              0                1              0          1                           1                         0
## 18              0                0              1          1                           0                         1
## 19              1                0              0          0                           0                         0
## 20              0                1              0          0                           0                         0
## 21              0                0              1          0                           0                         0
## 22              1                0              0          1                           0                         0
## 23              0                1              0          1                           1                         0
## 24              0                0              1          1                           0                         1
## 25              1                0              0          0                           0                         0
## 26              0                1              0          0                           0                         0
## 27              0                0              1          0                           0                         0
## 28              1                0              0          1                           0                         0
## 29              0                1              0          1                           1                         0
## 30              0                0              1          1                           0                         1
## 31              1                0              0          0                           0                         0
## 32              0                1              0          0                           0                         0
## 33              0                0              1          0                           0                         0
## 34              1                0              0          1                           0                         0
## 35              0                1              0          1                           1                         0
## 36              0                0              1          1                           0                         1
## 37              1                0              0          0                           0                         0
## 38              0                1              0          0                           0                         0
## 39              0                0              1          0                           0                         0
## 40              1                0              0          1                           0                         0
## 41              0                1              0          1                           1                         0
## 42              0                0              1          1                           0                         1
## 43              1                0              0          0                           0                         0
## 44              0                1              0          0                           0                         0
## 45              0                0              1          0                           0                         0
## 46              1                0              0          1                           0                         0
## 47              0                1              0          1                           1                         0
## 48              0                0              1          1                           0                         1
## 49              1                0              0          0                           0                         0
## 50              0                1              0          0                           0                         0
## 51              0                0              1          0                           0                         0
## 52              1                0              0          1                           0                         0
## 53              0                1              0          1                           1                         0
## 54              0                0              1          1                           0                         1
## 55              1                0              0          0                           0                         0
## 56              0                1              0          0                           0                         0
## 57              0                0              1          0                           0                         0
## 58              1                0              0          1                           0                         0
## 59              0                1              0          1                           1                         0
## 60              0                0              1          1                           0                         1
## 61              1                0              0          0                           0                         0
## 62              0                1              0          0                           0                         0
## 63              0                0              1          0                           0                         0
## 64              1                0              0          1                           0                         0
## 65              0                1              0          1                           1                         0
## 66              0                0              1          1                           0                         1
## 67              1                0              0          0                           0                         0
## 68              0                1              0          0                           0                         0
## 69              0                0              1          0                           0                         0
## 70              1                0              0          1                           0                         0
## 71              0                1              0          1                           1                         0
## 72              0                0              1          1                           0                         1
## 73              1                0              0          0                           0                         0
## 74              0                1              0          0                           0                         0
## 75              0                0              1          0                           0                         0
## 76              1                0              0          1                           0                         0
## 77              0                1              0          1                           1                         0
## 78              0                0              1          1                           0                         1
## 79              1                0              0          0                           0                         0
## 80              0                1              0          0                           0                         0
## 81              0                0              1          0                           0                         0
## 82              1                0              0          1                           0                         0
## 83              0                1              0          1                           1                         0
## 84              0                0              1          1                           0                         1
## 85              1                0              0          0                           0                         0
## 86              0                1              0          0                           0                         0
## 87              0                0              1          0                           0                         0
## 88              1                0              0          1                           0                         0
## 89              0                1              0          1                           1                         0
## 90              0                0              1          1                           0                         1
## 91              1                0              0          0                           0                         0
## 92              0                1              0          0                           0                         0
## 93              0                0              1          0                           0                         0
## 94              1                0              0          1                           0                         0
## 95              0                1              0          1                           1                         0
## 96              0                0              1          1                           0                         1
## 97              1                0              0          0                           0                         0
## 98              0                1              0          0                           0                         0
## 99              0                0              1          0                           0                         0
## 100             1                0              0          1                           0                         0
## 101             0                1              0          1                           1                         0
## 102             0                0              1          1                           0                         1
## 103             1                0              0          0                           0                         0
## 104             0                1              0          0                           0                         0
## 105             0                0              1          0                           0                         0
## 106             1                0              0          1                           0                         0
## 107             0                1              0          1                           1                         0
## 108             0                0              1          1                           0                         1
## 109             1                0              0          0                           0                         0
## 110             0                1              0          0                           0                         0
## 111             0                0              1          0                           0                         0
## 112             1                0              0          1                           0                         0
## 113             0                1              0          1                           1                         0
## 114             0                0              1          1                           0                         1
## 115             1                0              0          0                           0                         0
## 116             0                1              0          0                           0                         0
## 117             0                0              1          0                           0                         0
## 118             1                0              0          1                           0                         0
## 119             0                1              0          1                           1                         0
## 120             0                0              1          1                           0                         1
## attr(,"assign")
## [1] 1 1 1 2 3 3
## attr(,"contrasts")
## attr(,"contrasts")$fertilizer
## [1] "contr.treatment"
## 
## attr(,"contrasts")$water
## [1] "contr.treatment"

matplot(
    model.matrix(~ 0 + fertilizer * water, data=pumpkins),
    type='l')

plot of chunk plot_model_matrix3

Note:

For fine control of factors in linear models, either

relevel them, or
manually set their contrasts.

Exercise:

Make formulas that give you estimates of

A global mean ($\mu$), two fertilizer effects ($\alpha_\text{medium}$ and $\alpha_\text{high}$), and one water effect ($\beta\text{water}$).
Three fertilizer effects ($\alpha_\text{low}$, $\alpha_\text{medium}$ and $\alpha_\text{high}$), and one water effect ($\beta\text{water}$).
Two fertilizer effects ($\alpha_\text{medium}$ and $\alpha_\text{high}$), and two water effect ($\beta\text{no water}$ and $\beta\text{water}$).
A single mean for each of the six conditions ($\gamma_\text{high, water}$, $\gamma_\text{medium, water}$, $\gamma_\text{low, water}$, $\gamma_\text{high, no water}$, $\gamma_\text{medium, no water}$, $\gamma_\text{low, no water}$).

Example:

A global mean ($\mu$), two fertilizer effects ($\alpha_\text{medium}$ and $\alpha_\text{high}$), and one water effect ($\gamma_\text{water}$).

Example:

summary(lm(weight ~ fertilizer + water, data=pumpkins))

## 
## Call:
## lm(formula = weight ~ fertilizer + water, data = pumpkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.77906 -1.15970 -0.01605  1.17450  2.32947 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3.4415     0.2455   14.02   <2e-16 ***
## fertilizermedium   3.5873     0.3007   11.93   <2e-16 ***
## fertilizerhigh     7.0027     0.3007   23.29   <2e-16 ***
## waterwater         3.1258     0.2455   12.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.345 on 116 degrees of freedom
## Multiple R-squared:  0.8586, Adjusted R-squared:  0.855 
## F-statistic: 234.9 on 3 and 116 DF,  p-value: < 2.2e-16

Linear models

Parent-offspring “regression”

plot of chunk plot_galton

A note on history

A note on history

This resulted in Galton’s formulation of the Law of Ancestral Heredity, which states that the two parents of an offspring jointly contribute one half of an offspring’s heritage, while more-removed ancestors constitute a smaller proportion of the offspring’s heritage. Galton viewed reversion as a spring, that when stretched, would return the distribution of traits back to the normal distribution. When Mendel’s principles were rediscovered in 1900, this resulted in a fierce battle between the followers of Galton’s Law of Ancestral Heredity, the biometricians, and those who advocated Mendel’s principles.

Francis Galton, Wikipedia

Covariance and correlation

Pearson’s product-moment correlation coefficient: \[ r = \frac{\sum_{i=1}^n (x_i - \bar x) (y_i - \bar y)}{\sqrt{\sum_{i=1}^n (y_i - \bar y)^2} \sqrt{\sum_{i=1}^n (y_i - \bar y)^2}} \]

> help(cor)

Correlation, Variance and Covariance (Matrices)

Description:

     ‘var’, ‘cov’ and ‘cor’ compute the variance of ‘x’ and the
     covariance or correlation of ‘x’ and ‘y’ if these are vectors.  If
     ‘x’ and ‘y’ are matrices then the covariances (or correlations)
     between the columns of ‘x’ and the columns of ‘y’ are computed.

     ‘cov2cor’ scales a covariance matrix into the corresponding
     correlation matrix _efficiently_.

Usage:

     var(x, y = NULL, na.rm = FALSE, use)
     
     cov(x, y = NULL, use = "everything",
         method = c("pearson", "kendall", "spearman"))
     
     cor(x, y = NULL, use = "everything",
         method = c("pearson", "kendall", "spearman"))

Covariance and correlation

plot of chunk plot_cors

Anscombe’s quartet

plot of chunk plot_ansc

Linear models

\[ \text{(response)} = \text{(intercept)} + \text{(explanatory variables)} + \text{("error"}) \] in the general form: \[ y_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_k x_{ik} + \epsilon_i , \] where $\beta_0, \beta_1, \ldots, \beta_k$ are the parameters of the linear model.

Goal: find $b_0, \ldots, b_k$ to best fit the model: \[ y_i = b_0 + b_1 x_{i1} + \cdots + b_k x_{ik} + e_i, \] so that $b_i$ is an estimate of $\beta_i$ and $e_i$ is the residual, an estimate of $\epsilon_i$.

Least-squares fitting of a linear model

Define the predicted values: \[ \hat y_i = b_0 + b_1 x_{i1} + \cdots + b_k x_{ik}, \] and find $b_0, \ldots, b_k$ to minimize the sum of squared residuals, or \[ \sum_i \left(y_i - \hat y_i\right)^2 . \]

Amazing fact: if $k=1$ then \[b_1 = r \frac{\text{sd}(y)}{\text{sd}(x)} .\]

Relationship to likelihood: the Normal distribution.

Demonstration: heights

Let’s recreate Galton’s classic analysis: midparent height, adjusted for gender, is a good predictor of child height. (How good?)

Link to the data.

galton <- read.table("../Datasets/galton/galton-all.tsv", header=TRUE)
head(galton)

##   family father mother gender height kids male female
## 1      1   78.5   67.0      M   73.2    4    1      0
## 2      1   78.5   67.0      F   69.2    4    0      1
## 3      1   78.5   67.0      F   69.0    4    0      1
## 4      1   78.5   67.0      F   69.0    4    0      1
## 5      2   75.5   66.5      M   73.5    4    1      0
## 6      2   75.5   66.5      M   72.5    4    1      0

Linear models

Outline

Multivariate ANOVA

The factorial ANOVA model

Example: pumpkin pie

Questions:

First, a simplification

Making it real with simulation

First, make up some numbers

Next, a data format

Exercise: simulation

Questions that (linear models and) ANOVA can answer

What are (estimates of) the coefficents?

Do different fertilizer levels differ? water?

What are all those numbers?

What do they mean?

Which levels are different from which other ones?

Example

Does the effect of fertilizer depend on water?

Model comparison with ANOVA

The idea

The \(F\) statistic

Nested model analysis

Your turn

Formulas

Anatomy of a formula

Intercepts

Interactions

A translation table

The secret to formulas

Note:

Exercise:

Example:

Linear models

Parent-offspring “regression”

A note on history

A note on history

A note on history

Covariance and correlation

Covariance and correlation

Anscombe’s quartet

Linear models

Least-squares fitting of a linear model

Demonstration: heights