Peter Ralph
28 January 2021 – Advanced Biological Statistics
We have counts of transcript numbers,
from some individuals of different ages and past exposures to solar irradiation,
of two genotypes.
\[\begin{aligned} Z_i &\sim \Poisson(y_i) \\ y_i &= \exp(a_{g_i} + b \times \text{age}_i \\ &\qquad {} + c_{g_i} \times \text{exposure}_i ) \end{aligned}\]
fit1 <- brm(counts ~ genotype + age + exposure * genotype,
data=count_data, family=poisson(link='log'),
prior=prior("normal(0, 5)", class="b"))
## Warning: partial match of 'dpar' to 'dpars'
## Compiling Stan program...
## Start sampling
Recall that if \(X \sim \Poisson(\lambda)\) then \[ \E[X] = \var[X] = \lambda, \] and so a “\(z\)-score” is \[\begin{aligned} \E\left(\frac{X - \lambda}{\sqrt{\lambda}}\right) = 0, \qquad \qquad \text{SD}\left(\frac{X - \lambda}{\sqrt{\lambda}}\right) = 1. \end{aligned}\]
## Family: poisson
## Links: mu = log
## Formula: counts ~ genotype + age + exposure * genotype
## Data: count_data (Number of observations: 500)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.31 0.07 0.17 0.46 1.00 2693 2731
## genotypeB 1.91 0.05 1.81 2.01 1.00 1957 2800
## age 0.05 0.01 0.04 0.06 1.00 4317 2909
## exposure 0.09 0.00 0.08 0.10 1.00 2348 2852
## genotypeB:exposure -0.21 0.01 -0.22 -0.19 1.00 2227 2365
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
\[\begin{aligned} Z_i &\sim \Poisson(\exp(\mu_i)) \\ \mu_i &\sim \Normal(y_i, \sigma) \\ y_i &= a_{g_i} + b \times \text{age}_i + c_{g_i} \times \text{exposure}_i \end{aligned}\]
is equivalent to
\[\begin{aligned} Z_i &\sim \Poisson(\exp(y_i)) \\ y_i &= a_{g_i} + b \times \text{age}_i + c_{g_i} \times \text{exposure}_i + \epsilon_i \\ \epsilon_i &\sim \Normal(0, \sigma) \end{aligned}\]
\[\begin{aligned} Z_i &\sim \Poisson(\exp(y_i)) \\ y_i &= a_{g_i} + b \times \text{age}_i + c_{g_i} \times \text{exposure}_i + \epsilon_i \\ \epsilon_i &\sim \Normal(0, \sigma) \end{aligned}\]
count_data$id <- 1:nrow(count_data)
fit2 <- brm(counts ~ genotype + age + exposure * genotype + (1|id),
data=count_data, family=poisson(link='log'),
prior=prior("normal(0, 5)", class="b"))
## Warning: partial match of 'dpar' to 'dpars'
## Compiling Stan program...
## Start sampling
## Family: poisson
## Links: mu = log
## Formula: counts ~ genotype + age + exposure * genotype + (1 | id)
## Data: count_data (Number of observations: 500)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~id (Number of levels: 500)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.72 0.04 0.65 0.79 1.01 881 1486
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.01 0.16 -0.30 0.31 1.00 1374 2284
## genotypeB 1.91 0.11 1.70 2.13 1.00 1233 1693
## age 0.05 0.01 0.03 0.08 1.00 1256 1929
## exposure 0.10 0.01 0.08 0.11 1.00 813 1239
## genotypeB:exposure -0.20 0.01 -0.23 -0.17 1.00 1206 1582
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The overdispersed model fits better, but is it better?
brms_kfold <- function (K, models, xy) {
stopifnot(!is.null(names(models)))
Kfold <- sample(rep(1:K, nrow(xy)/K))
results <- data.frame(rep=1:K)
for (j in seq_along(models)) {
train <- test <- rep(NA, K)
for (k in 1:K) {
new_fit <- update(models[[j]], newdata=subset(xy, Kfold != k))
train[k] <- sqrt(mean(resid(new_fit)[,"Estimate"]^2))
test_y <- xy$y[Kfold == k]
test[k] <- sqrt(mean(
(test_y - predict(new_fit, newdata=subset(xy, Kfold==k), re_formula=NA)[,"Estimate"])^2 ))
}
results[[paste0(names(models)[j], "_train")]] <- train
results[[paste0(names(models)[j], "_test")]] <- test
}
return(results)
}
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
brms_kfold2 <- function (K, models, xy) {
stopifnot(!is.null(names(models)))
Kfold <- sample(rep(1:K, nrow(xy)/K))
results <- data.frame(rep=1:K)
for (j in seq_along(models)) {
train <- test <- rep(NA, K)
for (k in 1:K) {
new_fit <- update(models[[j]], newdata=subset(xy, Kfold != k))
train[k] <- sqrt(mean(resid(new_fit, re_formula=NA)[,"Estimate"]^2))
test_y <- xy$y[Kfold == k]
test[k] <- sqrt(mean(
(test_y - predict(new_fit, newdata=subset(xy, Kfold==k), re_formula=NA)[,"Estimate"])^2 ))
}
results[[paste0(names(models)[j], "_train")]] <- train
results[[paste0(names(models)[j], "_test")]] <- test
}
return(results)
}
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling
## Warning: partial match of 'dpar' to 'dpars'
## Start sampling