\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Linear models

Peter Ralph

27 October – Advanced Biological Statistics

Linear models

Parent-offspring “regression”

plot of chunk r plot_galton

“Regression”???

This resulted in Galton’s formulation of the Law of Ancestral Heredity, which states that the two parents of an offspring jointly contribute one half of an offspring’s heritage, while more-removed ancestors constitute a smaller proportion of the offspring’s heritage. Galton viewed reversion as a spring, that when stretched, would return the distribution of traits back to the normal distribution. When Mendel’s principles were rediscovered in 1900, this resulted in a fierce battle between the followers of Galton’s Law of Ancestral Heredity, the biometricians, and those who advocated Mendel’s principles.

Francis Galton, Wikipedia

Covariance and correlation

Pearson’s product-moment correlation coefficient: \[ r^2 = \frac{\sum_{i=1}^n (x_i - \bar x) (y_i - \bar y)}{\sqrt{\sum_{i=1}^n (y_i - \bar y)^2} \sqrt{\sum_{i=1}^n (y_i - \bar y)^2}} \]

> help(cor)

Correlation, Variance and Covariance (Matrices)

Description:

     ‘var’, ‘cov’ and ‘cor’ compute the variance of ‘x’ and the
     covariance or correlation of ‘x’ and ‘y’ if these are vectors.  If
     ‘x’ and ‘y’ are matrices then the covariances (or correlations)
     between the columns of ‘x’ and the columns of ‘y’ are computed.

     ‘cov2cor’ scales a covariance matrix into the corresponding
     correlation matrix _efficiently_.

Usage:

     var(x, y = NULL, na.rm = FALSE, use)
     
     cov(x, y = NULL, use = "everything",
         method = c("pearson", "kendall", "spearman"))
     
     cor(x, y = NULL, use = "everything",
         method = c("pearson", "kendall", "spearman"))

Covariance and correlation

plot of chunk r plot_cors

Anscombe’s quartet

plot of chunk r plot_ansc

Linear models

\[ \text{(response)} = \text{(intercept)} + \text{(explanatory variables)} + \text{("error"}) \] in the general form: \[ y_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_k x_{ik} + \epsilon_i , \] where \(\beta_0, \beta_1, \ldots, \beta_k\) are the parameters of the linear model.

Goal: find \(b_0, \ldots, b_k\) to best fit the model: \[ y_i = b_0 + b_1 x_{i1} + \cdots + b_k x_{ik} + e_i, \] so that \(b_i\) is an estimate of \(\beta_i\) and \(e_i\) is the residual, an estimate of \(\epsilon_i\).

Least-squares fitting of a linear model

Define the predicted values: \[ \hat y_i = b_0 + b_1 x_{i1} + \cdots + b_k x_{ik}, \] and find \(b_0, \ldots, b_k\) to minimize the sum of squared residuals, or \[ \sum_i \left(y_i - \hat y_i\right)^2 . \]

Amazing fact: if \(k=1\) then \[b_1 = r \frac{\text{sd}(y)}{\text{sd}(x)} .\]

Relationship to likelihood: the Normal distribution.

Exercise: heights

You might be interested to recreate Galton’s analysis: midparent height, adjusted for gender, is a pretty good (and, linear!) predictor of child height. (How good?)

Link to the data.

galton <- read.table("../Datasets/galton/galton-all.tsv", header=TRUE)
head(galton)

##   family father mother gender height kids male female
## 1      1   78.5   67.0      M   73.2    4    1      0
## 2      1   78.5   67.0      F   69.2    4    0      1
## 3      1   78.5   67.0      F   69.0    4    0      1
## 4      1   78.5   67.0      F   69.0    4    0      1
## 5      2   75.5   66.5      M   73.5    4    1      0
## 6      2   75.5   66.5      M   72.5    4    1      0