Peter Ralph
Advanced Biological Statistics
We have counts of mutations,
from some individuals of different ages and past exposures to solar irradiation,
of two genotypes.
Model:
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \end{aligned}\]
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \mu_i &= a + b \times \text{age}_i + c \times \text{exposure}_i \end{aligned}\]
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \mu_i &= a_{g_i} + b \times \text{age}_i + c_{g_i} \times \text{exposure}_i \end{aligned}\]
Counts are Poisson,
with mean that depends on age and exposure,
but effect of exposure depends on genotype.
\[\begin{aligned} Z_i &\sim \Poisson(\mu_i) \\ \mu_i &= \exp\left( a_{g_i} + b \times \text{age}_i \right. \\ &\qquad \left. {} + c_{g_i} \times \text{exposure}_i \right) \end{aligned}\]
Here are the data:
## genotype age exposure counts
## 1 A 11.961887 24.7839999 17
## 2 B 8.699680 2.4985780 12
## 3 B 13.044583 0.6582441 5
## 4 A 16.193124 2.5269076 3
## 5 B 5.438608 23.3953875 1
## 6 A 14.261364 8.2820704 2
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b age (vectorized)
## (flat) b genotypeA (vectorized)
## (flat) b genotypeA:exposure (vectorized)
## (flat) b genotypeB (vectorized)
## (flat) b genotypeB:exposure (vectorized)
## student_t(3, 0, 4.4) sigma default
fit1 <- brm(
counts ~ 0 + genotype + age + genotype:exposure,
family=poisson(link='log'),
prior=set_prior("normal(0, 1)", class="b"),
data=countdata
)
## Compiling Stan program...
## Start sampling
## Family: poisson
## Links: mu = log
## Formula: counts ~ 0 + genotype + age + genotype:exposure
## Data: countdata (Number of observations: 500)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genotypeA 0.30 0.07 0.16 0.44 1.00 1301 1626
## genotypeB 2.21 0.06 2.09 2.34 1.00 1358 1765
## age 0.05 0.01 0.04 0.06 1.00 1334 1829
## genotypeA:exposure 0.09 0.00 0.09 0.10 1.00 2802 2534
## genotypeB:exposure -0.12 0.01 -0.13 -0.10 1.00 2685 2507
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior distributions of the parameters:
Idea: Let’s simulate up data under this model and see if it looks like the real data.
A bunch of times, we’ll
mu
)mu
… with the pp_check
function!
## Using all posterior draws for ppc type 'intervals' by default.
Use the negbinomial
family
… which introduces more randomness:
\[\begin{aligned} Z_i &\sim \Poisson(M_i) \\ M_i &\sim \Gam(\text{mean}=\mu_i, \text{shape}=r) \\ \mu_i &= \exp(y_i) \\ y_i &= a_{g_i} + b \times \text{age}_i \\ &\qquad {} + c_{g_i} \times \text{exposure}_i \end{aligned}\]
fit2 <- brm(
counts ~ 0 + genotype + age + genotype:exposure,
family=negbinomial(link='log'),
prior=set_prior("normal(0, 1)", class="b"),
data=countdata
)
## Compiling Stan program...
## Start sampling
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: counts ~ 0 + genotype + age + genotype:exposure
## Data: countdata (Number of observations: 500)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## genotypeA 0.17 0.15 -0.14 0.47 1.00 1719 1659
## genotypeB 2.05 0.15 1.76 2.33 1.00 1847 1603
## age 0.06 0.01 0.04 0.09 1.00 1753 1756
## genotypeA:exposure 0.09 0.01 0.08 0.11 1.00 3105 2326
## genotypeB:exposure -0.10 0.01 -0.13 -0.08 1.00 3703 2711
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.95 0.17 1.63 2.31 1.00 3243 2506
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior distributions of the parameters:
## Using all posterior draws for ppc type 'intervals' by default.