Peter Ralph
27 October – Advanced Biological Statistics
pumpkins <- read.table("data/pumpkins.tsv", header=TRUE)
pumpkins$plot <- factor(pumpkins$plot)
head(pumpkins)
## fertilizer water plot plant weight mean_weight
## 1 low no water 1 1 8.231855 8
## 2 medium no water 1 1 13.478381 14
## 3 high no water 1 1 21.095921 20
## 4 low water 1 1 8.152066 6
## 5 medium water 1 1 17.195926 16
## 6 high water 1 1 21.328989 20
##
## Call:
## lm(formula = weight ~ fertilizer + water, data = pumpkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5293 -1.0821 0.1251 1.0006 3.0004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.8660 0.2682 74.084 <2e-16 ***
## fertilizerlow -13.0065 0.3284 -39.603 <2e-16 ***
## fertilizermedium -4.7970 0.3284 -14.606 <2e-16 ***
## waterwater 0.3018 0.2682 1.126 0.263
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.469 on 116 degrees of freedom
## Multiple R-squared: 0.9326, Adjusted R-squared: 0.9309
## F-statistic: 535.2 on 3 and 116 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = weight ~ fertilizer * water, data = pumpkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.96871 -0.64682 0.00617 0.77549 2.05257
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.00462 0.25368 78.857 < 2e-16 ***
## fertilizerlow -12.08324 0.35876 -33.680 < 2e-16 ***
## fertilizermedium -6.13599 0.35876 -17.103 < 2e-16 ***
## waterwater 0.02462 0.35876 0.069 0.945402
## fertilizerlow:waterwater -1.84651 0.50736 -3.639 0.000412 ***
## fertilizermedium:waterwater 2.67808 0.50736 5.278 6.3e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.135 on 114 degrees of freedom
## Multiple R-squared: 0.9605, Adjusted R-squared: 0.9588
## F-statistic: 554.3 on 5 and 114 DF, p-value: < 2.2e-16
Me: Hey, I made our model more complicated, and look, it fits better!
You: Yeah, of course it does. How much better?
Me: How can we tell?
You: Well, does it reduce the residual variance more than you’d expect by chance?
To compare two models, \[\begin{aligned} F &= \frac{\text{(explained variance)}}{\text{(residual variance)}} \\ &= \frac{\text{(mean square model)}}{\text{(mean square residual)}} \\ &= \frac{\frac{\text{RSS}_1 - \text{RSS}_2}{p_2-p_1}}{\frac{\text{RSS}_2}{n-p_2}} \end{aligned}\]
anova(
lm(weight ~ water, data=pumpkins),
lm(weight ~ fertilizer + water, data=pumpkins),
lm(weight ~ fertilizer * water, data=pumpkins)
)
## Analysis of Variance Table
##
## Model 1: weight ~ water
## Model 2: weight ~ fertilizer + water
## Model 3: weight ~ fertilizer * water
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 118 3711.3
## 2 116 250.2 2 3461.0 1344.507 < 2.2e-16 ***
## 3 114 146.7 2 103.5 40.212 6.095e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Do a stepwise model comparison of nested linear models, including plot
in the analysis. Think about what order to do the comparison in. Make sure they are nested!
Data: data/pumpkins.tsv