Peter Ralph
Advanced Biological Statistics
Use the data to try to reproduce their model:
BIR = 14,195.35 - 141.75 (sand%) - 142.10 (silt%) - 142.56 (clay%)
They’re not wrong! What’s up with those coefficients, though?
Read in the data:
x <- read.table("data/infiltration_data.tsv", header=TRUE)
range(rowSums(x[, c("Sand", "Clay", "Silt")]))
## [1] 100 100
Fit the model:
##
## Call:
## lm(formula = BIR ~ Sand + Silt + Clay, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.39 -1.52 0.63 1.99 4.27
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -62.112 14.617 -4.25 0.00033 ***
## Sand 0.831 0.160 5.21 3.2e-05 ***
## Silt 0.473 0.234 2.02 0.05577 .
## Clay NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.9 on 22 degrees of freedom
## Multiple R-squared: 0.627, Adjusted R-squared: 0.593
## F-statistic: 18.5 on 2 and 22 DF, p-value: 1.96e-05
There is a small difference between our predictions and theirs - due to rounding, it will turn out.
Ok, so Sand + Silt + Clay
is always equal to 100, so for any number a
, it’s always true that a * Sand + a * Silt + a * Clay - a * 100 = 0
. So, we can add this to our formula and get exactly the same thing. Our model has no coefficient for Clay
, so to make their model look like ours we can do this with a
equal to their coefficient for Clay
:
So, what are these numbers?
The new coefficients that are equivalent to theirs are nearly the same as ours - the difference is because of rounding error!
coefs <- as.data.frame(coef(the_lm))
names(coefs)[[1]] <- "lm"
coefs$theirs <- c(14195.35, - 141.75, - 142.10, - 142.56)
coefs$theirs_noclay <- c(
14195.35 - clay_coef * 100,
(clay_coef - 141.75),
(clay_coef - 142.10),
(clay_coef - 142.56)
)
coefs
## lm theirs theirs_noclay
## (Intercept) -62.11 14195 -60.65
## Sand 0.83 -142 0.81
## Silt 0.47 -142 0.46
## Clay NA -143 0.00