Peter Ralph
Advanced Biological Statistics
The brms
package lets you
fit hierarchical models using Stan
with mixed-model syntax!!!
# e.g.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = poisson(link='log'))
# or
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = student(link='identity'))
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = gaussian(link='identity'))
formula
: almost just like lme4
data
: must contain all the variablesfamily
: distribution of responselink
: connects mean to linear predictorThere are several classes of parameter in a brms model:
b
: the population-level (or, fixed)
effectssd
: the standard deviations of group-level
(or, random) effectssigma
for the
GaussianExamples:
b_x
: the slope of x
:
class="b", coef="x"
sd_f
: the SD of effects for levels of f
:
class="sd", coef="f"
Pass a vector of “priors”, specified by
set_prior(prior, class="b", ...)
where prior
is some valid Stan code.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
family = gaussian(link='identity'),
prior=c(set_prior("normal(0, 5)", class="b"),
set_prior("cauchy(0, 1)", class="sd", coef="f")))
Some example data:
xy <- data.frame(x = rnorm(100),
y = rexp(100),
f = factor(sample(letters[1:3], 100, replace=TRUE)))
xy$z <- xy$x + as.numeric(xy$f) * xy$y + rnorm(100, sd=0.1)
The formula:
Default:
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b x (vectorized)
## (flat) b y (vectorized)
## lkj(1) cor default
## lkj(1) cor f (vectorized)
## student_t(3, 1.6, 2.5) Intercept default
## student_t(3, 0, 2.5) sd default
## student_t(3, 0, 2.5) sd f (vectorized)
## student_t(3, 0, 2.5) sd Intercept f (vectorized)
## student_t(3, 0, 2.5) sd y f (vectorized)
## student_t(3, 0, 2.5) sigma default
## Compiling Stan program...
## Start sampling
## Warning: There were 349 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 29 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: There were 349 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: z ~ x + y + (1 + y | f)
## Data: xy (Number of observations: 100)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~f (Number of levels: 3)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.07 0.17 0.00 0.50 1.02 1012 1019
## sd(y) 1.18 0.41 0.56 2.10 1.02 371 1768
## cor(Intercept,y) -0.06 0.54 -0.94 0.91 1.01 1155 1517
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.02 0.11 -0.09 0.13 1.00 1683 1385
## x 1.00 0.01 0.98 1.03 1.01 256 2378
## y 1.91 0.64 0.54 3.13 1.02 484 1063
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.10 0.01 0.09 0.12 1.00 2500 2372
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Or…
launch_shinystan(the_fit)
Summaries of, or samples from, the posteriors of:
fixef( )
: “fixed” effectsranef( )
: “random” effectsfitted( )
: posterior distribution of mean
response (see posterior_epred
)predict( )
: posterior distribution of actual responses
(see posterior_predict
)hypothesis( )
: get posterior distributions of
functions of various parameters (e.g., difference between two
classes)conditional_effects( )
: effect of one predictor
conditioned on values of othersparnames( )
: list of parameter namespp_check( )
: compare response distribution to posterior
predictive simulationsloo( )
leave-one-out crossvalidation for model
comparisonbayes_R2( )
: Bayesian \(r^2\)help(brmsformula)
help(brmsfamily)
(but note can use those in help(family)
also)help(set_prior)
and also check what can have a prior with get_prior( )
stancode(the_fit)
(and standata(the_fit)
)loo( )
\[\begin{aligned} Y &\sim \text{Family}(\text{mean}=\mu) \\ \mu &= \text{link}(X \beta) . \end{aligned}\]
We can ask about:
\[\begin{aligned} Y &\sim \text{Family}(\text{mean}=\mu) \\ \mu &= \text{link}(X \beta + Z u) . \end{aligned}\]
We can ask about:
… with or without the group-level (random) effects, \(u\).
batting <- read.csv("data/BattingAveragePlus.csv", header=TRUE, stringsAsFactors=TRUE)
batting$scaled_height <- (batting$height - mean(batting$height))/sd(batting$height)
batting$scaled_weight <- (batting$weight - mean(batting$weight))/sd(batting$weight)
bb_fit <- brm(
Hits | trials(AtBats) ~ 0 + scaled_weight + scaled_height + PriPos + (1 | PriPos:Player),
data = batting,
family = "binomial",
prior = c(prior(normal(0, 5), class = b),
prior(normal(0, 5), class = sd)),
iter = 2000, chains = 3
)
## Compiling Stan program...
## Start sampling
Example: Is height associated with batting average? Which positions tend to be better batters?
Translated: What’s the posterior distribution of the effect of height on logit batting average?
Tools: (all have pars=
arguments)
summary( )
mcmc_hist( )
mcmc_intervals( )
extract( )
## Family: binomial
## Links: mu = logit
## Formula: Hits | trials(AtBats) ~ 0 + scaled_weight + scaled_height + PriPos + (1 | PriPos:Player)
## Data: batting (Number of observations: 886)
## Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 3000
##
## Group-Level Effects:
## ~PriPos:Player (Number of levels: 886)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.14 0.01 0.12 0.16 1.00 1056 1204
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## scaled_weight 0.02 0.01 -0.01 0.04 1.00 1006 1645
## scaled_height -0.01 0.01 -0.03 0.02 1.00 853 1233
## PriPos1stBase -1.08 0.03 -1.14 -1.04 1.00 1384 1884
## PriPos2ndBase -1.09 0.03 -1.15 -1.04 1.00 1511 1706
## PriPos3rdBase -1.06 0.02 -1.10 -1.01 1.00 1416 2031
## PriPosCatcher -1.16 0.03 -1.21 -1.10 1.00 1445 1783
## PriPosCenterField -1.05 0.03 -1.10 -0.99 1.00 1262 1607
## PriPosLeftField -1.10 0.02 -1.14 -1.05 1.00 1826 2217
## PriPosPitcher -1.91 0.04 -2.00 -1.83 1.00 3201 2209
## PriPosRightField -1.05 0.03 -1.11 -1.00 1.01 1216 1712
## PriPosShortstop -1.10 0.03 -1.16 -1.04 1.00 1339 1966
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (scaled_weight) > 0 0.02 0.01 -0.01 0.04 7.26 0.88
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (scaled_height) < 0 -0.01 0.01 -0.03 0.01 2.58 0.72
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Example: What is the mean batting average of each position, as a function of weight?
Tools:
conditional_effects( )
: means in abstract
conditionsfitted( )
: expected values, summarized (optionally, for
new data)posterior_epred( )
: draws from the posterior mean
(optionally, for new data)conditional_effects
for average-height-and-weight:
## Setting all 'trials' variables to 1 by default if not specified otherwise.
conditional_effects
by weight at average height:
## Setting all 'trials' variables to 1 by default if not specified otherwise.
conditional_effects
for shortstops by weight that are
one SD above average height:
conditional_effects(bb_fit, effects="scaled_weight",
conditions=data.frame(PriPos="Shortstop", scaled_height=1))
## Setting all 'trials' variables to 1 by default if not specified otherwise.
The fitted( )
values: for the original data
posterior_epred( )
: posterior distribution of expected
values
Expected number of hits out of 50 at-bats for a shortstop of average weight that is 1.5 SD above average height?
pp <- posterior_epred(bb_fit,
newdata=data.frame(
PriPos="Shortstop",
scaled_weight=0,
scaled_height=1.5,
AtBats=50),
re_formula=NULL,
allow_new_levels=TRUE)
head(pp)
## [,1]
## [1,] 13.484300
## [2,] 11.600295
## [3,] 14.259553
## [4,] 9.274168
## [5,] 13.549492
## [6,] 12.882218
Compare: same thing but re_formula=NA
:
pp2 <- posterior_epred(bb_fit,
newdata=data.frame(
PriPos="Shortstop",
scaled_weight=0,
scaled_height=1.5,
AtBats=50),
re_formula=NA
)
head(pp2)
## [,1]
## [1,] 12.72004
## [2,] 12.51611
## [3,] 12.96435
## [4,] 12.25001
## [5,] 13.03064
## [6,] 12.54157
Suppose Miguel Cairo, and Franklin Gutierrez go up to bat 150 more times, how many hits will they get?
(mc <- subset(batting, Player %in% c("Miguel Cairo", "Franklin Gutierrez"))[, c("Player", "PriPos", "Hits", "AtBats", "scaled_height", "scaled_weight" )])
## Player PriPos Hits AtBats scaled_height scaled_weight
## 329 Franklin Gutierrez Center Field 39 150 0.2568063 -0.5613294
## 641 Miguel Cairo 1st Base 28 150 -0.1757610 0.6614721
Tools:
predict( )
: summaryposterior_predict( )
: get samples## Player PriPos Hits AtBats scaled_height scaled_weight Estimate Est.Error Q2.5 Q97.5
## 329 Franklin Gutierrez Center Field 39 150 0.2568063 -0.5613294 38.71067 6.295991 27.000 52
## 641 Miguel Cairo 1st Base 28 150 -0.1757610 0.6614721 34.73467 5.997743 23.975 47
fitted( )
vs predict( )
predict
always has greater uncertainty.
## Player PriPos Hits AtBats scaled_height scaled_weight predict.Estimate predict.Est.Error predict.Q2.5 predict.Q97.5 fitted.Estimate fitted.Est.Error fitted.Q2.5 fitted.Q97.5
## 329 Franklin Gutierrez Center Field 39 150 0.2568063 -0.5613294 38.85367 6.241306 27 52 38.86311 3.263611 32.44602 45.44312
## 641 Miguel Cairo 1st Base 28 150 -0.1757610 0.6614721 34.66067 5.987665 23 47 34.81707 3.103223 29.00026 41.07867
\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &= \logit(\beta_{p_i} + u_i) \\ u_i &\sim \Normal(0, \sigma^2) \end{aligned}\]
where for the \(i^\text{th}\) player,
## Player PriPos Hits AtBats scaled_height scaled_weight
## 20 Albert Pujols 1st Base 173 607 0.6893737 1.395153
How would we use the posterior to summarize:
Mean batting average of 1st base players of average weight and height?
Range of batting averages of actual 1st base players? We’ll get the posterior distributions of the min and max batting averages.
firstdf <- subset(batting, PriPos=="1st Base")
firstdf$AtBats <- 1
everyone_post <- posterior_epred(
bb_fit,
newdata=firstdf
)
minmaxpost <- data.frame(
min=rowMins(everyone_post),
max=rowMaxs(everyone_post)
)
layout(t(1:2))
hist(minmaxpost[,1], breaks=40, title='min batting avg')
## Warning in plot.window(xlim, ylim, "", ...): "title" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "title" is not a graphical parameter
## Warning in axis(1, ...): "title" is not a graphical parameter
## Warning in axis(2, ...): "title" is not a graphical parameter
## Warning in plot.window(xlim, ylim, "", ...): "title" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "title" is not a graphical parameter
## Warning in axis(1, ...): "title" is not a graphical parameter
## Warning in axis(2, ...): "title" is not a graphical parameter
Albert Pujols’ batting average?
How many hits Albert Pujol would get out of 100 at bats?
pp_check
Question: does our model fit the data?
Possible answer: gee, I dunno, let’s simulate from it and see?
Fit a model.
Draw a set of parameters from the posterior distribution.
With these, simulate a new data set.
Do this a few times, and compare the results to the original dataset.
brms
lets you do this with the
pp_check(brms_fit, type='xyz')
method
See the docs for options.
Let’s first go back to the pumpkin data from Week 5:
lme4
:Then, we fit a mixed model with lme4
:
lme_pumpkins <- lmer( weight ~ water * fertilizer + (1|plot:water:fertilizer), data=pumpkins)
summary(lme_pumpkins)
## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins
##
## REML criterion at convergence: 420.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.64930 -0.54801 0.03808 0.70427 1.69905
##
## Random effects:
## Groups Name Variance Std.Dev.
## plot:water:fertilizer (Intercept) 0.09576 0.3094
## Residual 1.93549 1.3912
## Number of obs: 120, groups: plot:water:fertilizer, 24
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.00577 0.34744 17.286
## waterwater 0.03078 0.49135 0.063
## fertilizerlow -1.10405 0.49135 -2.247
## fertilizermedium -0.66999 0.49135 -1.364
## waterwater:fertilizerlow 1.19187 0.69488 1.715
## waterwater:fertilizermedium 1.84760 0.69488 2.659
##
## Correlation of Fixed Effects:
## (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater -0.707
## fertilizrlw -0.707 0.500
## fertilzrmdm -0.707 0.500 0.500
## wtrwtr:frtlzrl 0.500 -0.707 -0.707 -0.354
## wtrwtr:frtlzrm 0.500 -0.707 -0.354 -0.707 0.500
brms
:Here’s the “same thing” with brms
:
## Compiling Stan program...
## Start sampling
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins (Number of observations: 120)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~plot:water:fertilizer (Number of levels: 24)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.35 0.22 0.02 0.83 1.01 822 1321
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.01 0.38 5.27 6.76 1.00 1421 1547
## waterwater 0.02 0.54 -1.02 1.08 1.00 1214 1528
## fertilizerlow -1.14 0.56 -2.23 -0.09 1.00 1413 1586
## fertilizermedium -0.68 0.55 -1.80 0.40 1.00 1420 1533
## waterwater:fertilizerlow 1.22 0.78 -0.26 2.82 1.00 1169 1109
## waterwater:fertilizermedium 1.86 0.76 0.42 3.35 1.00 1203 1631
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.40 0.10 1.23 1.61 1.00 3323 2678
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins (Number of observations: 120)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~plot:water:fertilizer (Number of levels: 24)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.35 0.22 0.02 0.83 1.01 822 1321
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.01 0.38 5.27 6.76 1.00 1421 1547
## waterwater 0.02 0.54 -1.02 1.08 1.00 1214 1528
## fertilizerlow -1.14 0.56 -2.23 -0.09 1.00 1413 1586
## fertilizermedium -0.68 0.55 -1.80 0.40 1.00 1420 1533
## waterwater:fertilizerlow 1.22 0.78 -0.26 2.82 1.00 1169 1109
## waterwater:fertilizermedium 1.86 0.76 0.42 3.35 1.00 1203 1631
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.40 0.10 1.23 1.61 1.00 3323 2678
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
## Data: pumpkins
##
## REML criterion at convergence: 420.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.64930 -0.54801 0.03808 0.70427 1.69905
##
## Random effects:
## Groups Name Variance Std.Dev.
## plot:water:fertilizer (Intercept) 0.09576 0.3094
## Residual 1.93549 1.3912
## Number of obs: 120, groups: plot:water:fertilizer, 24
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.00577 0.34744 17.286
## waterwater 0.03078 0.49135 0.063
## fertilizerlow -1.10405 0.49135 -2.247
## fertilizermedium -0.66999 0.49135 -1.364
## waterwater:fertilizerlow 1.19187 0.69488 1.715
## waterwater:fertilizermedium 1.84760 0.69488 2.659
##
## Correlation of Fixed Effects:
## (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater -0.707
## fertilizrlw -0.707 0.500
## fertilzrmdm -0.707 0.500 0.500
## wtrwtr:frtlzrl 0.500 -0.707 -0.707 -0.354
## wtrwtr:frtlzrm 0.500 -0.707 -0.354 -0.707 0.500
Try out:
launch_shinystan(brms_pumpkins)
conditional_effects(brms_pumpkins)
pp_check(brms_pumpkins, type='scatter')