\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Categorical data

Peter Ralph

21 January 2020 – Advanced Biological Statistics

Categorical data

Hair and Eye color

HairEyeColor             package:datasets              R Documentation

Hair and Eye Color of Statistics Students

Description:

     Distribution of hair and eye color and sex in 592 statistics
     students.

Usage:

     HairEyeColor
     
Format:

     A 3-dimensional array resulting from cross-tabulating 592
     observations on 3 variables.  The variables and their levels are
     as follows:

       No  Name  Levels                    
        1  Hair  Black, Brown, Red, Blond  
        2  Eye   Brown, Blue, Hazel, Green 
        3  Sex   Male, Female              
      
Details:

     The Hair x Eye table comes from a survey of students at the
     University of Delaware reported by Snee (1974).  The split by
     ‘Sex’ was added by Friendly (1992a) for didactic purposes.

     This data set is useful for illustrating various techniques for
     the analysis of contingency tables, such as the standard
     chi-squared test or, more generally, log-linear modelling, and
     graphical methods such as mosaic plots, sieve diagrams or
     association plots.

Source:

     <URL:
     http://euclid.psych.yorku.ca/ftp/sas/vcd/catdata/haireye.sas>

     Snee (1974) gives the two-way table aggregated over ‘Sex’.  The
     ‘Sex’ split of the ‘Brown hair, Brown eye’ cell was changed to
     agree with that used by Friendly (2000).

References:

     Snee, R. D. (1974).  Graphical display of two-way contingency
     tables.  _The American Statistician_, *28*, 9-12.  doi:
     10.2307/2683520 (URL: http://doi.org/10.2307/2683520).
## , , Sex = Male
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8
## 
## , , Sex = Female
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8
##     hair   eye    sex number
## 1  Black Brown   Male     32
## 2  Brown Brown   Male     53
## 3    Red Brown   Male     10
## 4  Blond Brown   Male      3
## 5  Black  Blue   Male     11
## 6  Brown  Blue   Male     50
## 7    Red  Blue   Male     10
## 8  Blond  Blue   Male     30
## 9  Black Hazel   Male     10
## 10 Brown Hazel   Male     25
## 11   Red Hazel   Male      7
## 12 Blond Hazel   Male      5
## 13 Black Green   Male      3
## 14 Brown Green   Male     15
## 15   Red Green   Male      7
## 16 Blond Green   Male      8
## 17 Black Brown Female     36
## 18 Brown Brown Female     66
## 19   Red Brown Female     16
## 20 Blond Brown Female      4
## 21 Black  Blue Female      9
## 22 Brown  Blue Female     34
## 23   Red  Blue Female      7
## 24 Blond  Blue Female     64
## 25 Black Hazel Female      5
## 26 Brown Hazel Female     29
## 27   Red Hazel Female      7
## 28 Blond Hazel Female      5
## 29 Black Green Female      2
## 30 Brown Green Female     14
## 31   Red Green Female      7
## 32 Blond Green Female      8

Questions:

  1. Are hair and eye color independent in this sample?
  2. Do hair and eye color proportions differ by sex?

Independence and multiplicativity

If hair and eye color are independent, then probabilities of combinations are multiplicative:

\[\begin{aligned} &\P\{\text{black hair and blue eyes}\} \\ &\qquad = \P\{\text{black hair}\} \times \P\{\text{blue eyes}\given\text{black hair}\} \\ \end{aligned}\]

which if independent is \[\begin{aligned} &\hphantom{\P\{\text{black hair and blue eyes}\}} \\ &\qquad = \P\{\text{black hair}\} \times \P\{\text{blue eyes}\} \end{aligned}\]

Multiplicativity to additivity

A model of independence will have a multiplicative form: \[ p_{ab} = p_a \times p_b . \]

Set \(\lambda = \log(p)\), so that \[ \lambda_{ab} = \lambda_a + \lambda_b . \]

The chi-squared statistic

Let’s start by looking at just hair and eye color, summing over sex:

##        Eye
## Hair    Brown Blue Hazel Green
##   Black    68   20    15     5
##   Brown   119   84    54    29
##   Red      26   17    14    14
##   Blond     7   94    10    16

Some questions

In this dataset…

  1. What proportion have blonde hair?
  2. What proportion have blue eyes?
  3. If hair and eye color assort independently, what proportion do you expect to have both blonde hair and blue eyes? How many people would this be?
  4. How many actually have both? Is this difference surprising?
  5. Do the same for black hair and green eyes.

“Expected” counts

Let \[\begin{aligned} n_{ij} &= (\text{observed}_{ij}) \\ &=(\text{observed number with hair $i$ and eye $j$}) \\ E_{ij} &= (\text{expected}_{ij}) \\ &=(\text{total number}) \times(\text{proportion with hair $i$}) \\ &\qquad \times (\text{proportion with eye $j$}) \\ &= n \times \left(\frac{n_{i\cdot}}{n}\right) \times \left(\frac{n_{\cdot j}}{n}\right) . \end{aligned}\]

Here \(n_{i \cdot}\) and \(n_{\cdot j}\) are the row and column sums.

We want to quantify how different the observed and expected are, inversely weighted by their noisiness: \[\begin{aligned} \sum_{ij} \left( \frac{ (\text{observed})_{ij} - (\text{expected})_{ij} }{ \SE[\text{observed}_{ij}] } \right)^2 \end{aligned}\]

So, what is \(\SE[\text{observed}_{ij}]\)?

What is \(\SE[\text{observed}_{ij}]\)?

Under the model of independence, \[\begin{aligned} n_{ij} &\sim \Binom(n, p_i q_j) , \\ \text{where}\quad p_i &= (\text{prob of hair color $i$}) \\ q_j &= (\text{prob of eye color $j$}) . \end{aligned}\]

So, \[\begin{aligned} \sd[n_{ij}] = \sqrt{ n p_i q_j (1 - p_i q_j) } , \end{aligned}\]

… and so how about this \[\begin{aligned} \SE[n_{ij}] &\approx \sqrt{ n p_i q_j } \\ &= \sqrt{(\text{expected}_{ij})} \qquad \ldots? \end{aligned}\]

The chi-squared statistic

\[\begin{aligned} \chi^2 &= \sum_{ij} \frac{ \left((\text{observed})_{ij} - (\text{expected})_{ij} \right)^2 }{ (\text{expected})_{ij} } . \end{aligned}\]

i.e., “observed minus expected squared, divided by expected”.

This gives us a number. What does it mean?

Stochastic minute

The chi-squared distribution

Suppose that \(Z_1, \ldots, Z_k\) are independent \(\Normal(0, 1)\). Then

\[ \chi^2 = Z_1^2 + \cdots + Z_k^2 \]

has the chi squared distribution with \(k\) degrees of freedom.

Notes:

  1. \(\chi^2\) is a unitless nonnegative numbers.

  2. \(\E[\chi^2] = k\).

  3. If instead \(Z_i \sim \Normal(\mu_i, \sigma_i)\), then \(\chi^2 = \sum_{i=1}^k (Z_i - \mu_i)^2 / \sigma_i\).

  4. \(\chi^2 \sim \Gamma(k/2, 1/2)\).

Asymptotics

If the number of observations in a contingency table with \(r\) rows and \(c\) columns is large, then the chi-squared statistic has, approximately, the chi-squared distribution with \((r-1)\times(c-1)\) degrees of freedom under the hypothesis of independence of rows and columns.

(Asymptotically, i.e., as the number of observations goes to infinity.)

Method 1: Chi-squared test for independence

A chi-squared test

## 
##  Pearson's Chi-squared test
## 
## data:  haireye_2d
## X-squared = 138, df = 9, p-value <2e-16

Um, ok?

More context

Let’s actually look at “observed minus expected”:

##        Eye
## Hair    Brown  Blue Hazel Green
##   Black  40.1  39.2  17.0  11.7
##   Brown 106.3 103.9  44.9  30.9
##   Red    26.4  25.8  11.2   7.7
##   Blond  47.2  46.1  20.0  13.7

Observed minus expected:

##        Eye
## Hair     Brown   Blue  Hazel  Green
##   Black  27.86 -19.22  -1.97  -6.68
##   Brown  12.72 -19.87   9.07  -1.92
##   Red    -0.39  -8.79   2.85   6.32
##   Blond -40.20  47.88  -9.95   2.27

Normalized by \(\sqrt{\text{expected}}\):

##        Eye
## Hair     Brown   Blue  Hazel  Green
##   Black  4.398 -3.069 -0.477 -1.954
##   Brown  1.233 -1.949  1.353 -0.345
##   Red   -0.075 -1.730  0.852  2.283
##   Blond -5.851  7.050 -2.228  0.613

Conclusions?

What about by sex?

Compute the chi-squared statistic with chisq.test( ):

## Warning in chisq.test(HairEyeColor[, , "Female"]): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  HairEyeColor[, , "Female"]
## X-squared = 107, df = 9, p-value <2e-16

Method 2: Permutation

recall the \(p\)-value

(definition here)

A permutation test estimates the “probability … under the model” part.

We still need the other stuff.

First, “individualize” the data:

##        hair   eye    sex
## 1     Black Brown   Male
## 1.1   Black Brown   Male
## 1.2   Black Brown   Male
## 1.3   Black Brown   Male
## 1.4   Black Brown   Male
## 1.5   Black Brown   Male
## 1.6   Black Brown   Male
## 1.7   Black Brown   Male
## 1.8   Black Brown   Male
## 1.9   Black Brown   Male
## 1.10  Black Brown   Male
## 1.11  Black Brown   Male
## 1.12  Black Brown   Male
## 1.13  Black Brown   Male
## 1.14  Black Brown   Male
## 1.15  Black Brown   Male
## 1.16  Black Brown   Male
## 1.17  Black Brown   Male
## 1.18  Black Brown   Male
## 1.19  Black Brown   Male
## 1.20  Black Brown   Male
## 1.21  Black Brown   Male
## 1.22  Black Brown   Male
## 1.23  Black Brown   Male
## 1.24  Black Brown   Male
## 1.25  Black Brown   Male
## 1.26  Black Brown   Male
## 1.27  Black Brown   Male
## 1.28  Black Brown   Male
## 1.29  Black Brown   Male
## 1.30  Black Brown   Male
## 1.31  Black Brown   Male
## 2     Brown Brown   Male
## 2.1   Brown Brown   Male
## 2.2   Brown Brown   Male
## 2.3   Brown Brown   Male
## 2.4   Brown Brown   Male
## 2.5   Brown Brown   Male
## 2.6   Brown Brown   Male
## 2.7   Brown Brown   Male
## 2.8   Brown Brown   Male
## 2.9   Brown Brown   Male
## 2.10  Brown Brown   Male
## 2.11  Brown Brown   Male
## 2.12  Brown Brown   Male
## 2.13  Brown Brown   Male
## 2.14  Brown Brown   Male
## 2.15  Brown Brown   Male
## 2.16  Brown Brown   Male
## 2.17  Brown Brown   Male
## 2.18  Brown Brown   Male
## 2.19  Brown Brown   Male
## 2.20  Brown Brown   Male
## 2.21  Brown Brown   Male
## 2.22  Brown Brown   Male
## 2.23  Brown Brown   Male
## 2.24  Brown Brown   Male
## 2.25  Brown Brown   Male
## 2.26  Brown Brown   Male
## 2.27  Brown Brown   Male
## 2.28  Brown Brown   Male
## 2.29  Brown Brown   Male
## 2.30  Brown Brown   Male
## 2.31  Brown Brown   Male
## 2.32  Brown Brown   Male
## 2.33  Brown Brown   Male
## 2.34  Brown Brown   Male
## 2.35  Brown Brown   Male
## 2.36  Brown Brown   Male
## 2.37  Brown Brown   Male
## 2.38  Brown Brown   Male
## 2.39  Brown Brown   Male
## 2.40  Brown Brown   Male
## 2.41  Brown Brown   Male
## 2.42  Brown Brown   Male
## 2.43  Brown Brown   Male
## 2.44  Brown Brown   Male
## 2.45  Brown Brown   Male
## 2.46  Brown Brown   Male
## 2.47  Brown Brown   Male
## 2.48  Brown Brown   Male
## 2.49  Brown Brown   Male
## 2.50  Brown Brown   Male
## 2.51  Brown Brown   Male
## 2.52  Brown Brown   Male
## 3       Red Brown   Male
## 3.1     Red Brown   Male
## 3.2     Red Brown   Male
## 3.3     Red Brown   Male
## 3.4     Red Brown   Male
## 3.5     Red Brown   Male
## 3.6     Red Brown   Male
## 3.7     Red Brown   Male
## 3.8     Red Brown   Male
## 3.9     Red Brown   Male
## 4     Blond Brown   Male
## 4.1   Blond Brown   Male
## 4.2   Blond Brown   Male
## 5     Black  Blue   Male
## 5.1   Black  Blue   Male
## 5.2   Black  Blue   Male
## 5.3   Black  Blue   Male
## 5.4   Black  Blue   Male
## 5.5   Black  Blue   Male
## 5.6   Black  Blue   Male
## 5.7   Black  Blue   Male
## 5.8   Black  Blue   Male
## 5.9   Black  Blue   Male
## 5.10  Black  Blue   Male
## 6     Brown  Blue   Male
## 6.1   Brown  Blue   Male
## 6.2   Brown  Blue   Male
## 6.3   Brown  Blue   Male
## 6.4   Brown  Blue   Male
## 6.5   Brown  Blue   Male
## 6.6   Brown  Blue   Male
## 6.7   Brown  Blue   Male
## 6.8   Brown  Blue   Male
## 6.9   Brown  Blue   Male
## 6.10  Brown  Blue   Male
## 6.11  Brown  Blue   Male
## 6.12  Brown  Blue   Male
## 6.13  Brown  Blue   Male
## 6.14  Brown  Blue   Male
## 6.15  Brown  Blue   Male
## 6.16  Brown  Blue   Male
## 6.17  Brown  Blue   Male
## 6.18  Brown  Blue   Male
## 6.19  Brown  Blue   Male
## 6.20  Brown  Blue   Male
## 6.21  Brown  Blue   Male
## 6.22  Brown  Blue   Male
## 6.23  Brown  Blue   Male
## 6.24  Brown  Blue   Male
## 6.25  Brown  Blue   Male
## 6.26  Brown  Blue   Male
## 6.27  Brown  Blue   Male
## 6.28  Brown  Blue   Male
## 6.29  Brown  Blue   Male
## 6.30  Brown  Blue   Male
## 6.31  Brown  Blue   Male
## 6.32  Brown  Blue   Male
## 6.33  Brown  Blue   Male
## 6.34  Brown  Blue   Male
## 6.35  Brown  Blue   Male
## 6.36  Brown  Blue   Male
## 6.37  Brown  Blue   Male
## 6.38  Brown  Blue   Male
## 6.39  Brown  Blue   Male
## 6.40  Brown  Blue   Male
## 6.41  Brown  Blue   Male
## 6.42  Brown  Blue   Male
## 6.43  Brown  Blue   Male
## 6.44  Brown  Blue   Male
## 6.45  Brown  Blue   Male
## 6.46  Brown  Blue   Male
## 6.47  Brown  Blue   Male
## 6.48  Brown  Blue   Male
## 6.49  Brown  Blue   Male
## 7       Red  Blue   Male
## 7.1     Red  Blue   Male
## 7.2     Red  Blue   Male
## 7.3     Red  Blue   Male
## 7.4     Red  Blue   Male
## 7.5     Red  Blue   Male
## 7.6     Red  Blue   Male
## 7.7     Red  Blue   Male
## 7.8     Red  Blue   Male
## 7.9     Red  Blue   Male
## 8     Blond  Blue   Male
## 8.1   Blond  Blue   Male
## 8.2   Blond  Blue   Male
## 8.3   Blond  Blue   Male
## 8.4   Blond  Blue   Male
## 8.5   Blond  Blue   Male
## 8.6   Blond  Blue   Male
## 8.7   Blond  Blue   Male
## 8.8   Blond  Blue   Male
## 8.9   Blond  Blue   Male
## 8.10  Blond  Blue   Male
## 8.11  Blond  Blue   Male
## 8.12  Blond  Blue   Male
## 8.13  Blond  Blue   Male
## 8.14  Blond  Blue   Male
## 8.15  Blond  Blue   Male
## 8.16  Blond  Blue   Male
## 8.17  Blond  Blue   Male
## 8.18  Blond  Blue   Male
## 8.19  Blond  Blue   Male
## 8.20  Blond  Blue   Male
## 8.21  Blond  Blue   Male
## 8.22  Blond  Blue   Male
## 8.23  Blond  Blue   Male
## 8.24  Blond  Blue   Male
## 8.25  Blond  Blue   Male
## 8.26  Blond  Blue   Male
## 8.27  Blond  Blue   Male
## 8.28  Blond  Blue   Male
## 8.29  Blond  Blue   Male
## 9     Black Hazel   Male
## 9.1   Black Hazel   Male
## 9.2   Black Hazel   Male
## 9.3   Black Hazel   Male
## 9.4   Black Hazel   Male
## 9.5   Black Hazel   Male
## 9.6   Black Hazel   Male
## 9.7   Black Hazel   Male
## 9.8   Black Hazel   Male
## 9.9   Black Hazel   Male
## 10    Brown Hazel   Male
## 10.1  Brown Hazel   Male
## 10.2  Brown Hazel   Male
## 10.3  Brown Hazel   Male
## 10.4  Brown Hazel   Male
## 10.5  Brown Hazel   Male
## 10.6  Brown Hazel   Male
## 10.7  Brown Hazel   Male
## 10.8  Brown Hazel   Male
## 10.9  Brown Hazel   Male
## 10.10 Brown Hazel   Male
## 10.11 Brown Hazel   Male
## 10.12 Brown Hazel   Male
## 10.13 Brown Hazel   Male
## 10.14 Brown Hazel   Male
## 10.15 Brown Hazel   Male
## 10.16 Brown Hazel   Male
## 10.17 Brown Hazel   Male
## 10.18 Brown Hazel   Male
## 10.19 Brown Hazel   Male
## 10.20 Brown Hazel   Male
## 10.21 Brown Hazel   Male
## 10.22 Brown Hazel   Male
## 10.23 Brown Hazel   Male
## 10.24 Brown Hazel   Male
## 11      Red Hazel   Male
## 11.1    Red Hazel   Male
## 11.2    Red Hazel   Male
## 11.3    Red Hazel   Male
## 11.4    Red Hazel   Male
## 11.5    Red Hazel   Male
## 11.6    Red Hazel   Male
## 12    Blond Hazel   Male
## 12.1  Blond Hazel   Male
## 12.2  Blond Hazel   Male
## 12.3  Blond Hazel   Male
## 12.4  Blond Hazel   Male
## 13    Black Green   Male
## 13.1  Black Green   Male
## 13.2  Black Green   Male
## 14    Brown Green   Male
## 14.1  Brown Green   Male
## 14.2  Brown Green   Male
## 14.3  Brown Green   Male
## 14.4  Brown Green   Male
## 14.5  Brown Green   Male
## 14.6  Brown Green   Male
## 14.7  Brown Green   Male
## 14.8  Brown Green   Male
## 14.9  Brown Green   Male
## 14.10 Brown Green   Male
## 14.11 Brown Green   Male
## 14.12 Brown Green   Male
## 14.13 Brown Green   Male
## 14.14 Brown Green   Male
## 15      Red Green   Male
## 15.1    Red Green   Male
## 15.2    Red Green   Male
## 15.3    Red Green   Male
## 15.4    Red Green   Male
## 15.5    Red Green   Male
## 15.6    Red Green   Male
## 16    Blond Green   Male
## 16.1  Blond Green   Male
## 16.2  Blond Green   Male
## 16.3  Blond Green   Male
## 16.4  Blond Green   Male
## 16.5  Blond Green   Male
## 16.6  Blond Green   Male
## 16.7  Blond Green   Male
## 17    Black Brown Female
## 17.1  Black Brown Female
## 17.2  Black Brown Female
## 17.3  Black Brown Female
## 17.4  Black Brown Female
## 17.5  Black Brown Female
## 17.6  Black Brown Female
## 17.7  Black Brown Female
## 17.8  Black Brown Female
## 17.9  Black Brown Female
## 17.10 Black Brown Female
## 17.11 Black Brown Female
## 17.12 Black Brown Female
## 17.13 Black Brown Female
## 17.14 Black Brown Female
## 17.15 Black Brown Female
## 17.16 Black Brown Female
## 17.17 Black Brown Female
## 17.18 Black Brown Female
## 17.19 Black Brown Female
## 17.20 Black Brown Female
## 17.21 Black Brown Female
## 17.22 Black Brown Female
## 17.23 Black Brown Female
## 17.24 Black Brown Female
## 17.25 Black Brown Female
## 17.26 Black Brown Female
## 17.27 Black Brown Female
## 17.28 Black Brown Female
## 17.29 Black Brown Female
## 17.30 Black Brown Female
## 17.31 Black Brown Female
## 17.32 Black Brown Female
## 17.33 Black Brown Female
## 17.34 Black Brown Female
## 17.35 Black Brown Female
## 18    Brown Brown Female
## 18.1  Brown Brown Female
## 18.2  Brown Brown Female
## 18.3  Brown Brown Female
## 18.4  Brown Brown Female
## 18.5  Brown Brown Female
## 18.6  Brown Brown Female
## 18.7  Brown Brown Female
## 18.8  Brown Brown Female
## 18.9  Brown Brown Female
## 18.10 Brown Brown Female
## 18.11 Brown Brown Female
## 18.12 Brown Brown Female
## 18.13 Brown Brown Female
## 18.14 Brown Brown Female
## 18.15 Brown Brown Female
## 18.16 Brown Brown Female
## 18.17 Brown Brown Female
## 18.18 Brown Brown Female
## 18.19 Brown Brown Female
## 18.20 Brown Brown Female
## 18.21 Brown Brown Female
## 18.22 Brown Brown Female
## 18.23 Brown Brown Female
## 18.24 Brown Brown Female
## 18.25 Brown Brown Female
## 18.26 Brown Brown Female
## 18.27 Brown Brown Female
## 18.28 Brown Brown Female
## 18.29 Brown Brown Female
## 18.30 Brown Brown Female
## 18.31 Brown Brown Female
## 18.32 Brown Brown Female
## 18.33 Brown Brown Female
## 18.34 Brown Brown Female
## 18.35 Brown Brown Female
## 18.36 Brown Brown Female
## 18.37 Brown Brown Female
## 18.38 Brown Brown Female
## 18.39 Brown Brown Female
## 18.40 Brown Brown Female
## 18.41 Brown Brown Female
## 18.42 Brown Brown Female
## 18.43 Brown Brown Female
## 18.44 Brown Brown Female
## 18.45 Brown Brown Female
## 18.46 Brown Brown Female
## 18.47 Brown Brown Female
## 18.48 Brown Brown Female
## 18.49 Brown Brown Female
## 18.50 Brown Brown Female
## 18.51 Brown Brown Female
## 18.52 Brown Brown Female
## 18.53 Brown Brown Female
## 18.54 Brown Brown Female
## 18.55 Brown Brown Female
## 18.56 Brown Brown Female
## 18.57 Brown Brown Female
## 18.58 Brown Brown Female
## 18.59 Brown Brown Female
## 18.60 Brown Brown Female
## 18.61 Brown Brown Female
## 18.62 Brown Brown Female
## 18.63 Brown Brown Female
## 18.64 Brown Brown Female
## 18.65 Brown Brown Female
## 19      Red Brown Female
## 19.1    Red Brown Female
## 19.2    Red Brown Female
## 19.3    Red Brown Female
## 19.4    Red Brown Female
## 19.5    Red Brown Female
## 19.6    Red Brown Female
## 19.7    Red Brown Female
## 19.8    Red Brown Female
## 19.9    Red Brown Female
## 19.10   Red Brown Female
## 19.11   Red Brown Female
## 19.12   Red Brown Female
## 19.13   Red Brown Female
## 19.14   Red Brown Female
## 19.15   Red Brown Female
## 20    Blond Brown Female
## 20.1  Blond Brown Female
## 20.2  Blond Brown Female
## 20.3  Blond Brown Female
## 21    Black  Blue Female
## 21.1  Black  Blue Female
## 21.2  Black  Blue Female
## 21.3  Black  Blue Female
## 21.4  Black  Blue Female
## 21.5  Black  Blue Female
## 21.6  Black  Blue Female
## 21.7  Black  Blue Female
## 21.8  Black  Blue Female
## 22    Brown  Blue Female
## 22.1  Brown  Blue Female
## 22.2  Brown  Blue Female
## 22.3  Brown  Blue Female
## 22.4  Brown  Blue Female
## 22.5  Brown  Blue Female
## 22.6  Brown  Blue Female
## 22.7  Brown  Blue Female
## 22.8  Brown  Blue Female
## 22.9  Brown  Blue Female
## 22.10 Brown  Blue Female
## 22.11 Brown  Blue Female
## 22.12 Brown  Blue Female
## 22.13 Brown  Blue Female
## 22.14 Brown  Blue Female
## 22.15 Brown  Blue Female
## 22.16 Brown  Blue Female
## 22.17 Brown  Blue Female
## 22.18 Brown  Blue Female
## 22.19 Brown  Blue Female
## 22.20 Brown  Blue Female
## 22.21 Brown  Blue Female
## 22.22 Brown  Blue Female
## 22.23 Brown  Blue Female
## 22.24 Brown  Blue Female
## 22.25 Brown  Blue Female
## 22.26 Brown  Blue Female
## 22.27 Brown  Blue Female
## 22.28 Brown  Blue Female
## 22.29 Brown  Blue Female
## 22.30 Brown  Blue Female
## 22.31 Brown  Blue Female
## 22.32 Brown  Blue Female
## 22.33 Brown  Blue Female
## 23      Red  Blue Female
## 23.1    Red  Blue Female
## 23.2    Red  Blue Female
## 23.3    Red  Blue Female
## 23.4    Red  Blue Female
## 23.5    Red  Blue Female
## 23.6    Red  Blue Female
## 24    Blond  Blue Female
## 24.1  Blond  Blue Female
## 24.2  Blond  Blue Female
## 24.3  Blond  Blue Female
## 24.4  Blond  Blue Female
## 24.5  Blond  Blue Female
## 24.6  Blond  Blue Female
## 24.7  Blond  Blue Female
## 24.8  Blond  Blue Female
## 24.9  Blond  Blue Female
## 24.10 Blond  Blue Female
## 24.11 Blond  Blue Female
## 24.12 Blond  Blue Female
## 24.13 Blond  Blue Female
## 24.14 Blond  Blue Female
## 24.15 Blond  Blue Female
## 24.16 Blond  Blue Female
## 24.17 Blond  Blue Female
## 24.18 Blond  Blue Female
## 24.19 Blond  Blue Female
## 24.20 Blond  Blue Female
## 24.21 Blond  Blue Female
## 24.22 Blond  Blue Female
## 24.23 Blond  Blue Female
## 24.24 Blond  Blue Female
## 24.25 Blond  Blue Female
## 24.26 Blond  Blue Female
## 24.27 Blond  Blue Female
## 24.28 Blond  Blue Female
## 24.29 Blond  Blue Female
## 24.30 Blond  Blue Female
## 24.31 Blond  Blue Female
## 24.32 Blond  Blue Female
## 24.33 Blond  Blue Female
## 24.34 Blond  Blue Female
## 24.35 Blond  Blue Female
## 24.36 Blond  Blue Female
## 24.37 Blond  Blue Female
## 24.38 Blond  Blue Female
## 24.39 Blond  Blue Female
## 24.40 Blond  Blue Female
## 24.41 Blond  Blue Female
## 24.42 Blond  Blue Female
## 24.43 Blond  Blue Female
## 24.44 Blond  Blue Female
## 24.45 Blond  Blue Female
## 24.46 Blond  Blue Female
## 24.47 Blond  Blue Female
## 24.48 Blond  Blue Female
## 24.49 Blond  Blue Female
## 24.50 Blond  Blue Female
## 24.51 Blond  Blue Female
## 24.52 Blond  Blue Female
## 24.53 Blond  Blue Female
## 24.54 Blond  Blue Female
## 24.55 Blond  Blue Female
## 24.56 Blond  Blue Female
## 24.57 Blond  Blue Female
## 24.58 Blond  Blue Female
## 24.59 Blond  Blue Female
## 24.60 Blond  Blue Female
## 24.61 Blond  Blue Female
## 24.62 Blond  Blue Female
## 24.63 Blond  Blue Female
## 25    Black Hazel Female
## 25.1  Black Hazel Female
## 25.2  Black Hazel Female
## 25.3  Black Hazel Female
## 25.4  Black Hazel Female
## 26    Brown Hazel Female
## 26.1  Brown Hazel Female
## 26.2  Brown Hazel Female
## 26.3  Brown Hazel Female
## 26.4  Brown Hazel Female
## 26.5  Brown Hazel Female
## 26.6  Brown Hazel Female
## 26.7  Brown Hazel Female
## 26.8  Brown Hazel Female
## 26.9  Brown Hazel Female
## 26.10 Brown Hazel Female
## 26.11 Brown Hazel Female
## 26.12 Brown Hazel Female
## 26.13 Brown Hazel Female
## 26.14 Brown Hazel Female
## 26.15 Brown Hazel Female
## 26.16 Brown Hazel Female
## 26.17 Brown Hazel Female
## 26.18 Brown Hazel Female
## 26.19 Brown Hazel Female
## 26.20 Brown Hazel Female
## 26.21 Brown Hazel Female
## 26.22 Brown Hazel Female
## 26.23 Brown Hazel Female
## 26.24 Brown Hazel Female
## 26.25 Brown Hazel Female
## 26.26 Brown Hazel Female
## 26.27 Brown Hazel Female
## 26.28 Brown Hazel Female
## 27      Red Hazel Female
## 27.1    Red Hazel Female
## 27.2    Red Hazel Female
## 27.3    Red Hazel Female
## 27.4    Red Hazel Female
## 27.5    Red Hazel Female
## 27.6    Red Hazel Female
## 28    Blond Hazel Female
## 28.1  Blond Hazel Female
## 28.2  Blond Hazel Female
## 28.3  Blond Hazel Female
## 28.4  Blond Hazel Female
## 29    Black Green Female
## 29.1  Black Green Female
## 30    Brown Green Female
## 30.1  Brown Green Female
## 30.2  Brown Green Female
## 30.3  Brown Green Female
## 30.4  Brown Green Female
## 30.5  Brown Green Female
## 30.6  Brown Green Female
## 30.7  Brown Green Female
## 30.8  Brown Green Female
## 30.9  Brown Green Female
## 30.10 Brown Green Female
## 30.11 Brown Green Female
## 30.12 Brown Green Female
## 30.13 Brown Green Female
## 31      Red Green Female
## 31.1    Red Green Female
## 31.2    Red Green Female
## 31.3    Red Green Female
## 31.4    Red Green Female
## 31.5    Red Green Female
## 31.6    Red Green Female
## 32    Blond Green Female
## 32.1  Blond Green Female
## 32.2  Blond Green Female
## 32.3  Blond Green Female
## 32.4  Blond Green Female
## 32.5  Blond Green Female
## 32.6  Blond Green Female
## 32.7  Blond Green Female

Compute the chi-squared statistic with chisq.test( ):

## List of 9
##  $ statistic: Named num 107
##   ..- attr(*, "names")= chr "X-squared"
##  $ parameter: Named int 9
##   ..- attr(*, "names")= chr "df"
##  $ p.value  : num 7.01e-19
##  $ method   : chr "Pearson's Chi-squared test"
##  $ data.name: chr "he_tab"
##  $ observed : 'table' int [1:4, 1:4] 9 64 34 7 36 4 66 16 2 8 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hair: chr [1:4] "Black" "Blond" "Brown" "Red"
##   .. ..$ eye : chr [1:4] "Blue" "Brown" "Green" "Hazel"
##  $ expected : num [1:4, 1:4] 18.9 29.5 52.1 13.5 20.3 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hair: chr [1:4] "Black" "Blond" "Brown" "Red"
##   .. ..$ eye : chr [1:4] "Blue" "Brown" "Green" "Hazel"
##  $ residuals: 'table' num [1:4, 1:4] -2.28 6.35 -2.51 -1.76 3.49 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hair: chr [1:4] "Black" "Blond" "Brown" "Red"
##   .. ..$ eye : chr [1:4] "Blue" "Brown" "Green" "Hazel"
##  $ stdres   : 'table' num [1:4, 1:4] -3.14 9.25 -4.26 -2.36 4.9 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ hair: chr [1:4] "Black" "Blond" "Brown" "Red"
##   .. ..$ eye : chr [1:4] "Blue" "Brown" "Green" "Hazel"
##  - attr(*, "class")= chr "htest"

Are hair and eye independent, given sex?

Permutations:

Result:

plot of chunk plot_perms

Conclusion?

(What did we actually test?)

Your turn:

Use a permutation test to assess whether the relation between hair and eye color differs by sex.

Stochastic facts

Poisson additivity

If we have Poisson-many things of two categories: \[\begin{aligned} A &\sim \Poisson(a) \\ B &\sim \Poisson(b) \end{aligned}\] then the total number of things is also Poisson: \[ A + B \sim \Poisson(a + b) \] and each thing chooses its type independently: \[ A \given A + B \sim \Binom\left(A+B, \frac{a}{a+b}\right) . \]

Poisson models for contingency tables

If we fit a model where

\[\begin{aligned} (\text{number in cell $ij$}) &\sim \Poisson(\lambda_{ij}) \end{aligned}\]

then \[\begin{aligned} \frac{\lambda_{ij}}{\sum_{k\ell} \lambda_{k\ell}} &= (\text{proportion in category $ij$}) \\ \frac{\lambda_{ij}}{\sum_{k} \lambda_{kj}} &= (\text{proportion of row $j$ in column $i$}) \\ \frac{\lambda_{ij}}{\lambda_{ij} + \lambda_{k\ell}} &= (\text{proportion of those in $k\ell$ or $ij$ that are $ij$}) \end{aligned}\]

Gene ontologies

A very simplified ontology:

Suppose each gene falls into one of these categories:

  1. cell death

    1. neuron death
    2. oxidative stress
    3. necrotic
    4. apoptosis
  1. cell motility

    1. actin polymerization
    2. sperm motility
    3. cilia
    4. axis elongation
  2. ion transport

    1. anion transport
    2. cation transport
    3. transmembrane transport

… and that we have detected a number of upregulated genes in our experiment.

Do these have anything in common?

I.e., are any gene ontologies enriched among the upregulated genes?

The data

##              top                    term num_genes upregulated downregulated
## 1     cell death            neuron death       922          39            27
## 2     cell death        oxidative stress      1932         100            54
## 3     cell death                necrotic      2020          98            57
## 4     cell death               apoptosis       972          59            28
## 5  cell motility    actin polymerization        57           4             3
## 6  cell motility          sperm motility       379          58            31
## 7  cell motility                   cilia      1096         147            79
## 8  cell motility         axis elongation       248         164            33
## 9  ion transport         anion transport       300          10            11
## 10 ion transport        cation transport       194           8             7
## 11 ion transport transmembrane transport      1905          46            32

Basic question: are the “upregulated” genes special?

## Warning in chisq.test(gomat): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  gomat
## X-squared = 1698, df = 20, p-value <2e-16

Yes. But, how?

A model

Let

\[\begin{aligned} N_i &= (\text{number of no difference genes in category $i$}) \\ &\sim \Poisson(\exp(\lambda_i^N)) \\ U_i &= (\text{number of upregulated genes in category $i$}) \\ &\sim \Poisson(\exp(\lambda_i^U)) \\ D_i &= (\text{number of downregulated genes in category $i$}) \\ &\sim \Poisson(\exp(\lambda_i^D)) . \end{aligned}\]

and

\[\begin{aligned} \lambda_i^N &= \alpha_i \\ \lambda_i^U &= \alpha_i + \delta_+ + \beta_1[\text{top}_i] + \beta_2[i] \\ \lambda_i^D &= \alpha_i + \delta_- + \beta_1[\text{top}_i] + \beta_2[i] + \gamma_i. \end{aligned}\]

Interpret these: how do we see if

  1. No enrichments among differentially regulated genes?
  2. More genes are up- than down-regulated?
  3. Greater differential regulation among cell death genes?
  4. Greater differential regulation among apoptosis genes?
  5. Greater downregulation of anion transport genes?

(in class)

## Error: <text>:1:6: unexpected '{'
## 1: data {
##          ^

Nonidentifiability

plot of chunk nonident

What do we really want to know?

Back to hair and eye colors: with brms

brms

Flat priors

##                  prior     class               coef group resp dpar nlpar bound
## 1                              b                                               
## 2                              b           eyeBrown                            
## 3                              b   eyeBrown:sexMale                            
## 4                              b           eyeGreen                            
## 5                              b   eyeGreen:sexMale                            
## 6                              b           eyeHazel                            
## 7                              b   eyeHazel:sexMale                            
## 8                              b          hairBlond                            
## 9                              b hairBlond:eyeBrown                            
## 10                             b hairBlond:eyeGreen                            
## 11                             b hairBlond:eyeHazel                            
## 12                             b  hairBlond:sexMale                            
## 13                             b          hairBrown                            
## 14                             b hairBrown:eyeBrown                            
## 15                             b hairBrown:eyeGreen                            
## 16                             b hairBrown:eyeHazel                            
## 17                             b  hairBrown:sexMale                            
## 18                             b            hairRed                            
## 19                             b   hairRed:eyeBrown                            
## 20                             b   hairRed:eyeGreen                            
## 21                             b   hairRed:eyeHazel                            
## 22                             b    hairRed:sexMale                            
## 23                             b            sexMale                            
## 24 student_t(3, 2, 10) Intercept

## Warning: partial match of 'dpar' to 'dpars'
## Compiling the C++ model
## Start sampling

plot of chunk brm_results

Proper priors

##                  prior     class               coef group resp dpar nlpar bound
## 1                              b                                               
## 2                              b           eyeBrown                            
## 3                              b   eyeBrown:sexMale                            
## 4                              b           eyeGreen                            
## 5                              b   eyeGreen:sexMale                            
## 6                              b           eyeHazel                            
## 7                              b   eyeHazel:sexMale                            
## 8                              b          hairBlond                            
## 9                              b hairBlond:eyeBrown                            
## 10                             b hairBlond:eyeGreen                            
## 11                             b hairBlond:eyeHazel                            
## 12                             b  hairBlond:sexMale                            
## 13                             b          hairBrown                            
## 14                             b hairBrown:eyeBrown                            
## 15                             b hairBrown:eyeGreen                            
## 16                             b hairBrown:eyeHazel                            
## 17                             b  hairBrown:sexMale                            
## 18                             b            hairRed                            
## 19                             b   hairRed:eyeBrown                            
## 20                             b   hairRed:eyeGreen                            
## 21                             b   hairRed:eyeHazel                            
## 22                             b    hairRed:sexMale                            
## 23                             b            sexMale                            
## 24 student_t(3, 2, 10) Intercept

## Warning: partial match of 'dpar' to 'dpars'
## Compiling the C++ model
## Start sampling

plot of chunk brm_results2

Stochastic minute

The Dirichlet distribution

A collection of \(n\) nonnegative random numbers \((P_1, \ldots, P_n)\) that sums to 1 has a Dirichlet(\(\alpha_1, \ldots, \alpha_n\)) distribution if it has probability density \[ \frac{1}{B(\alpha)} \prod_{i=1}^n p_i^{\alpha_i - 1} . \]

Facts:

  1. If \(U \sim \Beta(\alpha, \beta)\) then \((U, 1-U) \sim \Dirichlet(\alpha, \beta)\).

  2. If \(Y_1, \ldots, Y_n\) are independent, Exponential with rates \(\alpha_1, \ldots, \alpha_n\) then \[ Y / \sum_{i=1}^n Y_i \sim \Dirichlet(\alpha_1, \ldots, \alpha_n) . \]

  3. Useful? Yes, if you need a distribution on numbers that sum to one (e.g., class proportions).