Peter Ralph
21 January 2018 – Advanced Biological Statistics
Suppose we have numerical observations coming from \(m\) different groups. We’d like to know how different the means are between groups, compared to variation within groups.
For example: dry leaf mass after 10d of growth of \(k=5\) genotypes in in \(m=4\) different conditions. (data here)
Dry leaf mass after 10d of growth of \(k=5\) genotypes in in \(m=4\) different conditions.
Each group has a different mean; noise about these is Normal with the same SD. Group means are random deviations from a mean determined additively by genotype and treatment.
anova_model <- stan_model(model_code="
data {
int N;
int ngeno;
int ntreat;
vector[N] leaf;
int geno[N];
int treat[N];
}
parameters {
real mu;
vector[ngeno] alpha;
vector[ntreat] beta;
matrix[ngeno, ntreat] gamma;
matrix<lower=0>[ngeno, ntreat] sigma;
real<lower=0> sigma_gamma;
}
model {
vector[N] muvec;
vector[N] sigmavec;
muvec = mu + alpha[geno] + beta[treat];
for (k in 1:N) {
muvec[k] += sigma_gamma * gamma[geno[k], treat[k]];
sigmavec[k] = sigma[geno[k], treat[k]];
}
leaf ~ normal(muvec, sigmavec);
// DOESN'T WORK: leaf ~ normal(muvec, sigma[geno, treat]);
mu ~ normal(205, 10);
alpha ~ normal(0, 10);
beta ~ normal(0, 10);
for (k in 1:ngeno) {
// using Matt's trick here
gamma[k] ~ normal(0, 1);
sigma[k] ~ gamma(20, 10);
}
sigma_gamma ~ gamma(0.5, 0.5);
}
")
## Inference for Stan model: 585d0bae0af15bb83752f81372fca2b7.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5%
## mu 205.53 0.16 5.58 194.69 201.60 205.64 209.30 216.46
## alpha[1] -2.15 0.12 4.19 -10.27 -4.93 -2.26 0.65 6.32
## alpha[2] -5.46 0.12 4.19 -13.58 -8.27 -5.57 -2.69 2.91
## alpha[3] 1.26 0.12 4.19 -6.88 -1.55 1.15 4.05 9.70
## alpha[4] 5.58 0.12 4.19 -2.54 2.73 5.47 8.33 13.96
## alpha[5] 1.39 0.12 4.18 -6.77 -1.44 1.31 4.11 9.76
## beta[1] 3.29 0.13 4.56 -5.50 0.11 3.32 6.34 12.14
## beta[2] -3.53 0.13 4.56 -12.28 -6.67 -3.51 -0.43 5.37
## beta[3] 3.10 0.13 4.56 -5.66 -0.03 3.15 6.20 11.89
## beta[4] -1.75 0.13 4.56 -10.53 -4.91 -1.74 1.30 7.08
## gamma[1,1] 0.03 0.01 0.98 -1.89 -0.65 0.04 0.69 1.97
## gamma[1,2] 0.18 0.01 1.03 -1.85 -0.54 0.18 0.89 2.18
## gamma[1,3] -0.13 0.01 1.00 -2.05 -0.81 -0.15 0.53 1.88
## gamma[1,4] -0.03 0.01 0.98 -1.99 -0.69 -0.02 0.66 1.90
## gamma[2,1] -0.32 0.01 1.01 -2.26 -1.01 -0.33 0.37 1.65
## gamma[2,2] -0.06 0.01 0.97 -1.95 -0.72 -0.08 0.60 1.86
## gamma[2,3] 0.23 0.01 0.97 -1.70 -0.42 0.22 0.88 2.10
## gamma[2,4] 0.13 0.01 0.96 -1.74 -0.54 0.14 0.78 1.99
## gamma[3,1] 0.08 0.01 0.97 -1.83 -0.57 0.07 0.76 1.97
## gamma[3,2] 0.07 0.02 0.99 -1.94 -0.59 0.06 0.73 2.07
## gamma[3,3] -0.08 0.01 0.97 -1.91 -0.72 -0.11 0.56 1.85
## gamma[3,4] -0.06 0.01 0.94 -1.91 -0.69 -0.05 0.57 1.76
## gamma[4,1] 0.11 0.01 0.98 -1.84 -0.53 0.11 0.78 2.01
## gamma[4,2] -0.14 0.01 0.98 -2.07 -0.77 -0.14 0.52 1.77
## gamma[4,3] -0.08 0.01 0.96 -1.96 -0.73 -0.08 0.56 1.88
## gamma[4,4] 0.10 0.01 0.94 -1.75 -0.55 0.10 0.76 1.90
## gamma[5,1] 0.15 0.01 0.99 -1.82 -0.49 0.15 0.80 2.14
## gamma[5,2] -0.12 0.01 0.98 -2.02 -0.81 -0.13 0.54 1.82
## gamma[5,3] 0.07 0.01 0.94 -1.78 -0.56 0.07 0.72 1.90
## gamma[5,4] -0.12 0.02 1.02 -2.10 -0.83 -0.13 0.57 1.87
## sigma[1,1] 2.11 0.00 0.33 1.54 1.88 2.08 2.31 2.85
## sigma[1,2] 1.93 0.01 0.35 1.34 1.68 1.91 2.14 2.70
## sigma[1,3] 2.06 0.00 0.34 1.48 1.81 2.03 2.27 2.80
## sigma[1,4] 2.07 0.00 0.31 1.53 1.85 2.04 2.25 2.78
## sigma[2,1] 1.78 0.00 0.32 1.25 1.55 1.75 1.98 2.50
## sigma[2,2] 1.74 0.01 0.35 1.16 1.49 1.71 1.95 2.54
## sigma[2,3] 2.09 0.00 0.30 1.57 1.87 2.06 2.27 2.75
## sigma[2,4] 2.09 0.00 0.33 1.52 1.86 2.07 2.29 2.81
## sigma[3,1] 1.54 0.00 0.36 0.95 1.29 1.51 1.76 2.35
## sigma[3,2] 1.92 0.01 0.37 1.29 1.65 1.88 2.15 2.71
## sigma[3,3] 2.46 0.00 0.33 1.88 2.23 2.44 2.67 3.17
## sigma[3,4] 2.24 0.00 0.33 1.68 2.00 2.21 2.44 2.96
## sigma[4,1] 2.10 0.01 0.34 1.52 1.86 2.08 2.32 2.86
## sigma[4,2] 2.03 0.00 0.29 1.52 1.83 2.01 2.21 2.66
## sigma[4,3] 1.93 0.00 0.30 1.41 1.72 1.91 2.11 2.60
## sigma[4,4] 2.03 0.01 0.37 1.42 1.77 2.00 2.26 2.85
## sigma[5,1] 2.01 0.00 0.30 1.49 1.79 1.99 2.20 2.63
## sigma[5,2] 2.16 0.00 0.32 1.62 1.92 2.14 2.36 2.85
## sigma[5,3] 2.06 0.00 0.32 1.50 1.83 2.03 2.26 2.76
## sigma[5,4] 2.48 0.00 0.34 1.90 2.24 2.45 2.69 3.21
## sigma_gamma 0.16 0.00 0.17 0.00 0.02 0.10 0.23 0.61
## lp__ -387.87 0.13 5.17 -398.68 -391.22 -387.56 -384.18 -378.50
## n_eff Rhat
## mu 1196 1
## alpha[1] 1161 1
## alpha[2] 1156 1
## alpha[3] 1149 1
## alpha[4] 1159 1
## alpha[5] 1178 1
## beta[1] 1257 1
## beta[2] 1254 1
## beta[3] 1262 1
## beta[4] 1256 1
## gamma[1,1] 4724 1
## gamma[1,2] 5409 1
## gamma[1,3] 5401 1
## gamma[1,4] 5242 1
## gamma[2,1] 4666 1
## gamma[2,2] 5058 1
## gamma[2,3] 4708 1
## gamma[2,4] 4677 1
## gamma[3,1] 5166 1
## gamma[3,2] 4382 1
## gamma[3,3] 6297 1
## gamma[3,4] 4911 1
## gamma[4,1] 4899 1
## gamma[4,2] 4357 1
## gamma[4,3] 5379 1
## gamma[4,4] 5253 1
## gamma[5,1] 5357 1
## gamma[5,2] 5478 1
## gamma[5,3] 6638 1
## gamma[5,4] 4535 1
## sigma[1,1] 5446 1
## sigma[1,2] 4795 1
## sigma[1,3] 4907 1
## sigma[1,4] 4876 1
## sigma[2,1] 4408 1
## sigma[2,2] 4658 1
## sigma[2,3] 5212 1
## sigma[2,4] 5156 1
## sigma[3,1] 5319 1
## sigma[3,2] 4825 1
## sigma[3,3] 5145 1
## sigma[3,4] 4856 1
## sigma[4,1] 4652 1
## sigma[4,2] 4516 1
## sigma[4,3] 5362 1
## sigma[4,4] 5103 1
## sigma[5,1] 4977 1
## sigma[5,2] 5359 1
## sigma[5,3] 5076 1
## sigma[5,4] 5091 1
## sigma_gamma 2364 1
## lp__ 1666 1
##
## Samples were drawn using NUTS(diag_e) at Thu Jan 24 23:29:11 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
samples <- extract(fit)
# let's get the posterior distribution of
# the group mean for (genotype a, treatment 1)
samples <- extract(fit)
post_a1 <- samples$mu + samples$alpha[,1] + samples$beta[,1] + samples$gamma[,1,1]
post_means <- as.vector(samples$mu) + samples$gamma
for (k in 1:ngeno) {
post_means[,k,] <- post_means[,k,] + samples$alpha[,k]
}
for (k in 1:ntreat) {
post_means[,,k] <- post_means[,,k] + samples$beta[,k]
}
dim(post_means) <- c(dim(post_means)[1], prod(dim(post_means)[2:3]))
colnames(post_means) <- outer(levels(leafs$geno), levels(leafs$treat), paste, sep=".")
post_means <- as.data.frame(post_means)
A collection of \(n\) nonnegative random numbers \((P_1, \ldots, P_n)\) that sums to 1 has a Dirichlet(\(\alpha_1, \ldots, \alpha_n\)) distribution if it has probability density \[ \frac{1}{B(\alpha)} \prod_{i=1}^n p_i^{\alpha_i - 1} . \]
Facts:
If \(U \sim \Beta(\alpha, \beta)\) then \((U, 1-U) \sim \Dirichlet(\alpha, \beta)\).
If \(Y_1, \ldots, Y_n\) are independent, Exponential with rates \(\alpha_1, \ldots, \alpha_n\) then \[ Y / \sum_{i=1}^n Y_i \sim \Dirichlet(\alpha_1, \ldots, \alpha_n) . \]
Useful? Yes, if you need a distribution on numbers that sum to one (e.g., class proportions).
Simulate from a
and describe the difference. (Hint: plot these in the \((P_1, P_2)\) plane.)
HairEyeColor package:datasets R Documentation
Hair and Eye Color of Statistics Students
Description:
Distribution of hair and eye color and sex in 592 statistics
students.
Usage:
HairEyeColor
Format:
A 3-dimensional array resulting from cross-tabulating 592
observations on 3 variables. The variables and their levels are
as follows:
No Name Levels
1 Hair Black, Brown, Red, Blond
2 Eye Brown, Blue, Hazel, Green
3 Sex Male, Female
Details:
The Hair x Eye table comes from a survey of students at the
University of Delaware reported by Snee (1974). The split by
‘Sex’ was added by Friendly (1992a) for didactic purposes.
This data set is useful for illustrating various techniques for
the analysis of contingency tables, such as the standard
chi-squared test or, more generally, log-linear modelling, and
graphical methods such as mosaic plots, sieve diagrams or
association plots.
Source:
<URL:
http://euclid.psych.yorku.ca/ftp/sas/vcd/catdata/haireye.sas>
Snee (1974) gives the two-way table aggregated over ‘Sex’. The
‘Sex’ split of the ‘Brown hair, Brown eye’ cell was changed to
agree with that used by Friendly (2000).
References:
Snee, R. D. (1974). Graphical display of two-way contingency
tables. _The American Statistician_, *28*, 9-12. doi:
10.2307/2683520 (URL: http://doi.org/10.2307/2683520).
## , , Sex = Male
##
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
##
## , , Sex = Female
##
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
(haireye <- data.frame(hair = dimnames(HairEyeColor)[[1]][slice.index(HairEyeColor, 1)],
eye = dimnames(HairEyeColor)[[2]][slice.index(HairEyeColor, 2)],
sex = dimnames(HairEyeColor)[[3]][slice.index(HairEyeColor, 3)],
number = HairEyeColor))
## hair eye sex number.Hair number.Eye number.Sex number.Freq
## 1 Black Brown Male Black Brown Male 32
## 2 Brown Brown Male Brown Brown Male 53
## 3 Red Brown Male Red Brown Male 10
## 4 Blond Brown Male Blond Brown Male 3
## 5 Black Blue Male Black Blue Male 11
## 6 Brown Blue Male Brown Blue Male 50
## 7 Red Blue Male Red Blue Male 10
## 8 Blond Blue Male Blond Blue Male 30
## 9 Black Hazel Male Black Hazel Male 10
## 10 Brown Hazel Male Brown Hazel Male 25
## 11 Red Hazel Male Red Hazel Male 7
## 12 Blond Hazel Male Blond Hazel Male 5
## 13 Black Green Male Black Green Male 3
## 14 Brown Green Male Brown Green Male 15
## 15 Red Green Male Red Green Male 7
## 16 Blond Green Male Blond Green Male 8
## 17 Black Brown Female Black Brown Female 36
## 18 Brown Brown Female Brown Brown Female 66
## 19 Red Brown Female Red Brown Female 16
## 20 Blond Brown Female Blond Brown Female 4
## 21 Black Blue Female Black Blue Female 9
## 22 Brown Blue Female Brown Blue Female 34
## 23 Red Blue Female Red Blue Female 7
## 24 Blond Blue Female Blond Blue Female 64
## 25 Black Hazel Female Black Hazel Female 5
## 26 Brown Hazel Female Brown Hazel Female 29
## 27 Red Hazel Female Red Hazel Female 7
## 28 Blond Hazel Female Blond Hazel Female 5
## 29 Black Green Female Black Green Female 2
## 30 Brown Green Female Brown Green Female 14
## 31 Red Green Female Red Green Female 7
## 32 Blond Green Female Blond Green Female 8
Questions:
If hair and eye color are independent, then probabilities of combinations are multiplicative: \[\begin{aligned} \P\{\text{black hair and blue eyes}\} \\ &\qquad = \P\{\text{black hair}\} \times \P\{\text{blue eyes}\given\text{black hair}\} \\ \end{aligned}\]
which by independence is \[\begin{aligned} &\qquad = \P\{\text{black hair}\} \times \P\{\text{blue eyes}\} \end{aligned}\]
A model of independence will have a multiplicative form: \[ p_{ab} = p_a \times p_b . \]
Set \(\lambda = \log(p)\), so that \[ \lambda_{ab} = \lambda_a + \lambda_b . \]
If we have Poisson-many things of two categories: \[\begin{aligned} A &\sim \Poisson(a) \\ B &\sim \Poisson(b) \end{aligned}\] then the total number of things is also Poisson: \[ A + B \sim \Poisson(a + b) \] and each thing chooses its type independently: \[ A \given A + B \sim \Binom\left(A+B, \frac{a}{a+b}\right) . \]
For hair color \(i\) and eye color \(j\), the number of students with that combination is \[ N_{ij} \sim \Poisson(\mu_{ij}) . \]
If hair and eye color are independent of each other and of sex, then \[\begin{aligned} \mu_{ij} &= \exp\left( \alpha_i + \beta_j \right) . \end{aligned}\]
\[\begin{aligned} \mu_{ij} &= \exp\left( \alpha_i + \beta_j + \gamma_{ij} \right) ? \end{aligned}\]
Is this identifiable?
\[\begin{aligned} \mu_{ij} &= \exp\left( \delta_{ij} \right) \\ \delta_{ij} &\sim \Normal( \alpha_i + \beta_j, \sigma) \end{aligned}\]
Pick appropriate priors.