Peter Ralph
1 December 2020 – Advanced Biological Statistics
We have counts of mutations,
from some individuals of different ages and past exposures to solar irradiation,
of two genotypes.
Model:
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \end{aligned}\]
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \mu_i &= a + b \times \text{age}_i + c \times \text{exposure}_i \end{aligned}\]
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \mu_i &= a_{g_i} + b \times \text{age}_i + c_{g_i} \times \text{exposure}_i \end{aligned}\]
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \mu_i &= \exp\left( a_{g_i} + b \times \text{age}_i \right. \\ &\qquad \left. {} + c_{g_i} \times \text{exposure}_i \right) \end{aligned}\]
glm( )
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \mu_i &= \exp\left( a_{g_i} + b \times \text{age}_i + c_{g_i} \times \text{exposure}_i \right) \end{aligned}\]
Here are the data:
## genotype age exposure counts
## 1 A 11.961887 24.7839999 17
## 2 B 8.699680 2.4985780 12
## 3 B 13.044583 0.6582441 5
## 4 A 16.193124 2.5269076 3
## 5 B 5.438608 23.3953875 1
## 6 A 14.261364 8.2820704 2
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(y_i) \\ y_i &= \exp(a_{g_i} + b \times \text{age}_i \\ &\qquad {} + c_{g_i} \times \text{exposure}_i ) \end{aligned}\]
data {
int N;
int counts[N];
vector[N] age;
vector[N] exposure;
int genotype[N];
int ngeno;
}
parameters {
real a[ngeno]; // intercepts
real b; // slope for age
real c[ngeno]; // slopes for exposure
}
model {
vector[N] y; // means
y = exp(a[genotype] + b .* age
+ c[genotype] .* exposure);
counts ~ poisson(y);
// implicitly flat priors
}
Note: scaling the data helps Stan find the right scale to move on.
## Inference for Stan model: simple_poisson.
## 3 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=1500.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## a[1] 0.31 0.00 0.08 0.16 0.26 0.32 0.36 0.46 369 1.01
## a[2] 2.23 0.00 0.07 2.09 2.18 2.23 2.27 2.36 363 1.01
## b 0.05 0.00 0.01 0.04 0.04 0.05 0.05 0.06 346 1.01
## c[1] 0.09 0.00 0.00 0.09 0.09 0.09 0.09 0.10 1133 1.00
## c[2] -0.12 0.00 0.01 -0.13 -0.12 -0.12 -0.11 -0.10 894 1.00
## lp__ 3962.45 0.06 1.56 3958.60 3961.58 3962.71 3963.64 3964.59 673 1.00
##
## Samples were drawn using NUTS(diag_e) at Thu Dec 3 09:16:03 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Posterior distributions of the parameters:
Let’s simulate up data under this model to check for goodness of fit.
One hundred times, we’ll
mu
)mu
do_sim <- function () {
k <- sample.int(nrow(post1$a), 1)
params1 <- list(a=post1$a[k,],
b=post1$b[k],
c=post1$c[k,])
mu1 <- with(data,
exp(params1$a[as.numeric(genotype)]
+ params1$b * age
+ params1$c[as.numeric(genotype)] * exposure))
rpois(length(mu1), mu1)
}
# 100 datasets:
sim1 <- replicate(100, do_sim())
model {
vector[N] mu;
mu = exp(a[geno]
+ b * age
+ c[geno]
.* expo);
counts ~ poisson(mu);
quasipoisson
family in glm( )
\[\begin{aligned} Z_i &\sim \Poisson(y_i) \\ y_i &= \exp(\mu_i) \\ \mu_i &\sim \Normal(a_{g_i} + b \times \text{age}_i \\ &\qquad {} + c_{g_i} \times \text{exposure}_i, \sigma) \end{aligned}\]