Peter Ralph
Advanced Biological Statistics
The brms
package lets you
fit hierarchical models using Stan
with mixed-model syntax!!!
# e.g.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = poisson(link='log'))
# or
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = student(link='identity'))
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = gaussian(link='identity'))
formula
: almost just like lme4
data
: must contain all the variablesfamily
: distribution of responselink
: connects mean to linear predictorThere are several classes of parameter in a brms model:
b
: the population-level (or, fixed)
effectssd
: the standard deviations of group-level
(or, random) effectssigma
for the
GaussianExamples:
b_x
: the slope of x
:
class="b", coef="x"
sd_f
: the SD of effects for levels of f
:
class="sd", coef="f"
Pass a vector of “priors”, specified by
set_prior(prior, class="b", ...)
where prior
is some valid Stan code.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = gaussian(link='identity'),
prior=c(set_prior("normal(0, 5)", class="b"),
set_prior("cauchy(0, 1)", class="sd", coef="f")))
Some example data:
xy <- data.frame(x = rnorm(100),
y = rexp(100),
f = factor(sample(letters[1:3], 100, replace=TRUE)))
xy$z <- xy$x + as.numeric(xy$f) * xy$y + rnorm(100, sd=0.1)
The formula:
Default:
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b x (vectorized)
## (flat) b y (vectorized)
## lkj(1) cor default
## lkj(1) cor f (vectorized)
## student_t(3, 1.6, 2.5) Intercept default
## student_t(3, 0, 2.5) sd default
## student_t(3, 0, 2.5) sd f (vectorized)
## student_t(3, 0, 2.5) sd Intercept f (vectorized)
## student_t(3, 0, 2.5) sd y f (vectorized)
## student_t(3, 0, 2.5) sigma default
## Compiling Stan program...
## Start sampling
## Warning: There were 349 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 29 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: There were 349 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: z ~ x + y + (1 + y | f)
## Data: xy (Number of observations: 100)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~f (Number of levels: 3)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.07 0.17 0.00 0.50 1.02 1012 1019
## sd(y) 1.18 0.41 0.56 2.10 1.02 371 1768
## cor(Intercept,y) -0.06 0.54 -0.94 0.91 1.01 1155 1517
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.02 0.11 -0.09 0.13 1.00 1683 1385
## x 1.00 0.01 0.98 1.03 1.01 256 2378
## y 1.91 0.64 0.54 3.13 1.02 484 1063
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.10 0.01 0.09 0.12 1.00 2500 2372
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Or…
launch_shinystan(the_fit)
Summaries of, or samples from, the posteriors of:
fixef( )
: “fixed” effectsranef( )
: “random” effectsfitted( )
: posterior distribution of mean
response (see posterior_epred
)predict( )
: posterior distribution of actual responses
(see posterior_predict
)hypothesis( )
: get posterior distributions of
functions of various parameters (e.g., difference between two
classes)conditional_effects( )
: effect of one predictor
conditioned on values of othersparnames( )
: list of parameter namespp_check( )
: compare response distribution to posterior
predictive simulationsloo( )
leave-one-out crossvalidation for model
comparisonbayes_R2( )
: Bayesian \(r^2\)help(brmsformula)
help(brmsfamily)
(but note can use those in help(family)
also)help(set_prior)
and also check what can have a prior with get_prior( )
stancode(the_fit)
(and standata(the_fit)
)loo( )
\[\begin{aligned} Y &\sim \text{Family}(\text{mean}=\mu) \\ \mu &= \text{link}(X \beta) . \end{aligned}\]
We can ask about:
\[\begin{aligned} Y &\sim \text{Family}(\text{mean}=\mu) \\ \mu &= \text{link}(X \beta + Z u) . \end{aligned}\]
We can ask about:
… with or without the group-level (random) effects, \(u\).
batting <- read.csv("data/BattingAveragePlus.csv", header=TRUE, stringsAsFactors=TRUE)
batting$scaled_height <- (batting$height - mean(batting$height))/sd(batting$height)
batting$scaled_weight <- (batting$weight - mean(batting$weight))/sd(batting$weight)
bb_fit <- brm(
Hits | trials(AtBats) ~ 0 + scaled_weight + scaled_height + PriPos + (1 | PriPos:Player),
data = batting,
family = "binomial",
prior = c(prior(normal(0, 5), class = b),
prior(normal(0, 5), class = sd)),
iter = 2000, chains = 3
)
## Compiling Stan program...
## Start sampling
Example: Is height associated with batting average? Which positions tend to be better batters?
Translated: What’s the posterior distribution of the effect of height on logit batting average?
Tools: (all have pars=
arguments)
summary( )
mcmc_hist( )
mcmc_intervals( )
extract( )
## Family: binomial
## Links: mu = logit
## Formula: Hits | trials(AtBats) ~ 0 + scaled_weight + scaled_height + PriPos + (1 | PriPos:Player)
## Data: batting (Number of observations: 886)
## Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 3000
##
## Group-Level Effects:
## ~PriPos:Player (Number of levels: 886)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.14 0.01 0.12 0.16 1.00 1056 1204
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## scaled_weight 0.02 0.01 -0.01 0.04 1.00 1006 1645
## scaled_height -0.01 0.01 -0.03 0.02 1.00 853 1233
## PriPos1stBase -1.08 0.03 -1.14 -1.04 1.00 1384 1884
## PriPos2ndBase -1.09 0.03 -1.15 -1.04 1.00 1511 1706
## PriPos3rdBase -1.06 0.02 -1.10 -1.01 1.00 1416 2031
## PriPosCatcher -1.16 0.03 -1.21 -1.10 1.00 1445 1783
## PriPosCenterField -1.05 0.03 -1.10 -0.99 1.00 1262 1607
## PriPosLeftField -1.10 0.02 -1.14 -1.05 1.00 1826 2217
## PriPosPitcher -1.91 0.04 -2.00 -1.83 1.00 3201 2209
## PriPosRightField -1.05 0.03 -1.11 -1.00 1.01 1216 1712
## PriPosShortstop -1.10 0.03 -1.16 -1.04 1.00 1339 1966
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (scaled_weight) > 0 0.02 0.01 -0.01 0.04 7.26 0.88
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (scaled_height) < 0 -0.01 0.01 -0.03 0.01 2.58 0.72
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Example: What is the mean batting average of each position, as a function of weight?
Tools:
conditional_effects( )
: means in abstract
conditionsfitted( )
: expected values, summarized (optionally, for
new data)posterior_epred( )
: draws from the posterior mean
(optionally, for new data)conditional_effects
for average-height-and-weight:
## Setting all 'trials' variables to 1 by default if not specified otherwise.
conditional_effects
by weight at average height:
## Setting all 'trials' variables to 1 by default if not specified otherwise.
conditional_effects
for shortstops by weight that are
one SD above average height:
conditional_effects(bb_fit, effects="scaled_weight",
conditions=data.frame(PriPos="Shortstop", scaled_height=1))
## Setting all 'trials' variables to 1 by default if not specified otherwise.
The fitted( )
values: for the original data
bb_fitted <- fitted(bb_fit)
ggplot(cbind(batting, bb_fitted)) +
geom_segment(aes(x=Q2.5, xend=Q97.5, y=Hits, yend=Hits), col='red') +
geom_point(aes(x=Estimate, y=Hits)) +
geom_abline(intercept=0, slope=1) +
ylab("actual hits") + xlab("predicted hits") + coord_fixed()
posterior_epred( )
: posterior distribution of expected
values
Expected number of hits out of 50 at-bats for a shortstop of average weight that is 1.5 SD above average height?
pp <- posterior_epred(bb_fit,
newdata=data.frame(
PriPos="Shortstop",
scaled_weight=0,
scaled_height=1.5,
AtBats=50),
re_formula=NULL,
allow_new_levels=TRUE)
head(pp)
## [,1]
## [1,] 13.484300
## [2,] 11.600295
## [3,] 14.259553
## [4,] 9.274168
## [5,] 13.549492
## [6,] 12.882218
Compare: same thing but re_formula=NA
:
pp2 <- posterior_epred(bb_fit,
newdata=data.frame(
PriPos="Shortstop",
scaled_weight=0,
scaled_height=1.5,
AtBats=50),
re_formula=NA
)
head(pp2)
## [,1]
## [1,] 12.72004
## [2,] 12.51611
## [3,] 12.96435
## [4,] 12.25001
## [5,] 13.03064
## [6,] 12.54157
ggplot(data.frame(hits=c(pp, pp2), new_re=rep(c("include group effects", "no group effects"), each=length(pp)))) + geom_histogram(aes(x=hits), bins=32) + facet_wrap(~ new_re)
Suppose Miguel Cairo, and Franklin Gutierrez go up to bat 150 more times, how many hits will they get?
(mc <- subset(batting, Player %in% c("Miguel Cairo", "Franklin Gutierrez"))[, c("Player", "PriPos", "Hits", "AtBats", "scaled_height", "scaled_weight" )])
## Player PriPos Hits AtBats scaled_height scaled_weight
## 329 Franklin Gutierrez Center Field 39 150 0.2568063 -0.5613294
## 641 Miguel Cairo 1st Base 28 150 -0.1757610 0.6614721
Tools:
predict( )
: summaryposterior_predict( )
: get samples## Player PriPos Hits AtBats scaled_height scaled_weight Estimate Est.Error Q2.5 Q97.5
## 329 Franklin Gutierrez Center Field 39 150 0.2568063 -0.5613294 38.71067 6.295991 27.000 52
## 641 Miguel Cairo 1st Base 28 150 -0.1757610 0.6614721 34.73467 5.997743 23.975 47
fitted( )
vs predict( )
predict
always has greater uncertainty.
## Player PriPos Hits AtBats scaled_height scaled_weight predict.Estimate predict.Est.Error predict.Q2.5 predict.Q97.5 fitted.Estimate fitted.Est.Error fitted.Q2.5 fitted.Q97.5
## 329 Franklin Gutierrez Center Field 39 150 0.2568063 -0.5613294 38.85367 6.241306 27 52 38.86311 3.263611 32.44602 45.44312
## 641 Miguel Cairo 1st Base 28 150 -0.1757610 0.6614721 34.66067 5.987665 23 47 34.81707 3.103223 29.00026 41.07867
\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &= \logit(\beta_{p_i} + u_i) \\ u_i &\sim \Normal(0, \sigma^2) \end{aligned}\]
where for the \(i^\text{th}\) player,
## Player PriPos Hits AtBats scaled_height scaled_weight
## 20 Albert Pujols 1st Base 173 607 0.6893737 1.395153
How would we use the posterior to summarize:
Mean batting average of 1st base players of average weight and height?
firstpost <- posterior_epred(
bb_fit,
newdata=data.frame(
PriPos="1st Base",
scaled_weight=0,
scaled_height=0,
AtBats=1
),
re_formula=NA
)
hist(firstpost, breaks=40, xlab="batting average")
Range of batting averages of actual 1st base players? We’ll get the posterior distributions of the min and max batting averages.
firstdf <- subset(batting, PriPos=="1st Base")
firstdf$AtBats <- 1
everyone_post <- posterior_epred(
bb_fit,
newdata=firstdf
)
minmaxpost <- data.frame(
min=rowMins(everyone_post),
max=rowMaxs(everyone_post)
)
layout(t(1:2))
hist(minmaxpost[,1], breaks=40, title='min batting avg')
## Warning in plot.window(xlim, ylim, "", ...): "title" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "title" is not a graphical parameter
## Warning in axis(1, ...): "title" is not a graphical parameter
## Warning in axis(2, ...): "title" is not a graphical parameter
## Warning in plot.window(xlim, ylim, "", ...): "title" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "title" is not a graphical parameter
## Warning in axis(1, ...): "title" is not a graphical parameter
## Warning in axis(2, ...): "title" is not a graphical parameter
Albert Pujols’ batting average?
albert <- subset(batting,
Player=="Albert Pujols"
)
albert$AtBats <- 1
albert_post <- posterior_epred(
bb_fit,
newdata=albert
)
hist(albert_post, breaks=40,
main="Albert Pujols batting avg", xlab="batting avg")
How many hits Albert Pujol would get out of 100 at bats?
albert$AtBats <- 100
albert_post <- posterior_predict(
bb_fit,
newdata=albert
)
hist(albert_post, breaks=40,
main="Albert Pujols hits out of 100", xlab="batting avg")
pp_check
Question: does our model fit the data?
Possible answer: gee, I dunno, let’s simulate from it and see?
Fit a model.
Draw a set of parameters from the posterior distribution.
With these, simulate a new data set.
Do this a few times, and compare the results to the original dataset.
brms
lets you do this with the
pp_check(brms_fit, type='xyz')
method
See the docs for options.
Let’s first go back to the pumpkin data from Week 5:
lme4
:Then, we fit a mixed model with lme4
:
lme_pumpkins <- lmer( weight ~ water * fertilizer + (1|plot:water:fertilizer), data=pumpkins)
summary(lme_pumpkins)
## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins
##
## REML criterion at convergence: 420.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.64930 -0.54801 0.03808 0.70427 1.69905
##
## Random effects:
## Groups Name Variance Std.Dev.
## plot:water:fertilizer (Intercept) 0.09576 0.3094
## Residual 1.93549 1.3912
## Number of obs: 120, groups: plot:water:fertilizer, 24
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.00577 0.34744 17.286
## waterwater 0.03078 0.49135 0.063
## fertilizerlow -1.10405 0.49135 -2.247
## fertilizermedium -0.66999 0.49135 -1.364
## waterwater:fertilizerlow 1.19187 0.69488 1.715
## waterwater:fertilizermedium 1.84760 0.69488 2.659
##
## Correlation of Fixed Effects:
## (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater -0.707
## fertilizrlw -0.707 0.500
## fertilzrmdm -0.707 0.500 0.500
## wtrwtr:frtlzrl 0.500 -0.707 -0.707 -0.354
## wtrwtr:frtlzrm 0.500 -0.707 -0.354 -0.707 0.500
brms
:Here’s the “same thing” with brms
:
## Compiling Stan program...
## Start sampling
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins (Number of observations: 120)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~plot:water:fertilizer (Number of levels: 24)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.35 0.22 0.02 0.83 1.01 822 1321
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.01 0.38 5.27 6.76 1.00 1421 1547
## waterwater 0.02 0.54 -1.02 1.08 1.00 1214 1528
## fertilizerlow -1.14 0.56 -2.23 -0.09 1.00 1413 1586
## fertilizermedium -0.68 0.55 -1.80 0.40 1.00 1420 1533
## waterwater:fertilizerlow 1.22 0.78 -0.26 2.82 1.00 1169 1109
## waterwater:fertilizermedium 1.86 0.76 0.42 3.35 1.00 1203 1631
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.40 0.10 1.23 1.61 1.00 3323 2678
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins (Number of observations: 120)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~plot:water:fertilizer (Number of levels: 24)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.35 0.22 0.02 0.83 1.01 822 1321
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.01 0.38 5.27 6.76 1.00 1421 1547
## waterwater 0.02 0.54 -1.02 1.08 1.00 1214 1528
## fertilizerlow -1.14 0.56 -2.23 -0.09 1.00 1413 1586
## fertilizermedium -0.68 0.55 -1.80 0.40 1.00 1420 1533
## waterwater:fertilizerlow 1.22 0.78 -0.26 2.82 1.00 1169 1109
## waterwater:fertilizermedium 1.86 0.76 0.42 3.35 1.00 1203 1631
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.40 0.10 1.23 1.61 1.00 3323 2678
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins
##
## REML criterion at convergence: 420.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.64930 -0.54801 0.03808 0.70427 1.69905
##
## Random effects:
## Groups Name Variance Std.Dev.
## plot:water:fertilizer (Intercept) 0.09576 0.3094
## Residual 1.93549 1.3912
## Number of obs: 120, groups: plot:water:fertilizer, 24
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.00577 0.34744 17.286
## waterwater 0.03078 0.49135 0.063
## fertilizerlow -1.10405 0.49135 -2.247
## fertilizermedium -0.66999 0.49135 -1.364
## waterwater:fertilizerlow 1.19187 0.69488 1.715
## waterwater:fertilizermedium 1.84760 0.69488 2.659
##
## Correlation of Fixed Effects:
## (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater -0.707
## fertilizrlw -0.707 0.500
## fertilzrmdm -0.707 0.500 0.500
## wtrwtr:frtlzrl 0.500 -0.707 -0.707 -0.354
## wtrwtr:frtlzrm 0.500 -0.707 -0.354 -0.707 0.500
Try out:
launch_shinystan(brms_pumpkins)
conditional_effects(brms_pumpkins)
pp_check(brms_pumpkins, type='scatter')