Introduction to brms

Peter Ralph

4 December 2020 – Advanced Biological Statistics

Stan, but with formulas

The brms package lets you

fit hierarchical models using Stan

with mixed-model syntax!!!

# e.g.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = poisson(link='log'))
# or
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = student(link='identity'))

Overview of brms

Fitting models

brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = gaussian(link='identity'))

formula: almost just like lme4
data: must contain all the variables
family: distribution of response
link: connects mean to linear predictor

Parameterization

There are several classes of parameter in a brms model:

b : the population-level (or, fixed) effects
sd : the standard deviations of group-level (or, random) effects
family-specific parameters, like sigma for the Gaussian

Examples:

b_x : the slope of x : class="b", coef="x"
sd_f : the SD of effects for levels of f : class="sd", coef="f"

Setting priors

Pass a vector of “priors”, specified by

    set_prior(prior, class="b", ...)

where prior is some valid Stan code.

brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = gaussian(link='identity'),
    prior=c(set_prior("normal(0, 5)", class="b"),
            set_prior("cauchy(0, 1)", class="sd", coef="f")))

1. Set up the formula

xy <- data.frame(x = rnorm(100),
                 y = rexp(100),
                 f = factor(sample(letters[1:3], 100, replace=TRUE)))
xy$z <- xy$x + as.numeric(xy$f) * xy$y + rnorm(100, sd=0.1)

the_formula <- brmsformula(z ~ x + y + (1 + y | f))

2. Choose priors

Default:

get_prior(the_formula, data=xy)

##                   prior     class      coef group resp dpar nlpar bound       source
##                  (flat)         b                                            default
##                  (flat)         b         x                             (vectorized)
##                  (flat)         b         y                             (vectorized)
##                  lkj(1)       cor                                            default
##                  lkj(1)       cor               f                       (vectorized)
##  student_t(3, 1.6, 2.5) Intercept                                            default
##    student_t(3, 0, 2.5)        sd                                            default
##    student_t(3, 0, 2.5)        sd               f                       (vectorized)
##    student_t(3, 0, 2.5)        sd Intercept     f                       (vectorized)
##    student_t(3, 0, 2.5)        sd         y     f                       (vectorized)
##    student_t(3, 0, 2.5)     sigma                                            default

Choose modifications:

# for example, no good reason to do this
the_priors = c(set_prior("normal(0, 5)", class = "b"),
               set_prior("normal(0, 1)", class = "sd", coef="y", group="f"))

3. Fit the model

the_fit <- brm(the_formula, data=xy, family=gaussian(), 
               prior=the_priors)

## Compiling Stan program...

## Start sampling

## Warning: There were 151 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.

## Warning: There were 25 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

## Warning: Examine the pairs() plot to diagnose sampling problems

4. Check converence

summary(the_fit)

## Warning: There were 151 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: z ~ x + y + (1 + y | f) 
##    Data: xy (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~f (Number of levels: 3) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.06      0.14     0.00     0.41 1.00      956     1439
## sd(y)                1.14      0.41     0.54     2.15 1.00     1510     1670
## cor(Intercept,y)    -0.10      0.56    -0.95     0.91 1.00     1239     1792
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.02      0.08    -0.10     0.12 1.00     2424     2075
## x             1.00      0.01     0.98     1.03 1.00     3779     2524
## y             1.91      0.64     0.57     3.17 1.00     1025      982
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.10      0.01     0.09     0.12 1.00     3383     2148
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Or…

launch_shinystan(the_fit)

4. Look at results

Summaries of, or samples from, the posteriors of:

fixef( ): “fixed” effects
ranef( ): “random” effects
fitted( ): posterior distribution of mean response (see posterior_epred)
predict( ): posterior distribution of actual responses (see posterior_predict)
hypothesis( ): get posterior distributions of functions of various parameters (e.g., difference between two classes)
conditional_effects( ): effect of one predictor conditioned on values of others

More tools:

parnames( ): list of parameter names
pp_check( ): compare response distribution to posterior predictive simulations
loo( ) leave-one-out crossvalidation for model comparison
bayes_R2( ): Bayesian \(r^2\)

More info:

formulas: help(brmsformula)
families: help(brmsfamily) (but note can use those in help(family) also)
priors: help(set_prior) and also check what can have a prior with get_prior( )
get the Stan code: stancode(the_fit) (and standata(the_fit))
compare models with loo( )
more technical notes at this tutorial

`pp_check`

Question: does our model fit the data?

Possible answer: gee, I dunno, let’s simulate from it and see?

Posterior predictive simulations

Fit a model.
Draw a set of parameters from the posterior distribution.
With these, simulate a new data set.
Do this a few times, and compare the results to the original dataset.

brms lets you do this with the pp_check(brms_fit, type='xyz') method

Example: pumpkins

Let’s first go back to the pumpkin data from Week 5:

pumpkins <- read.table("data/pumpkins.tsv", header=TRUE)
pumpkins$plot <- factor(pumpkins$plot)
pumpkins$fertilizer <- factor(pumpkins$fertilizer)
pumpkins$water <- factor(pumpkins$water)

ggplot(pumpkins) + geom_boxplot(aes(x=fertilizer:water, y=weight, fill=water))

plot of chunk r plot_pumpkins

A mixed model with `lme4`:

Then, we fit a mixed model with lme4:

lme_pumpkins <- lmer( weight ~ water * fertilizer + (1|plot:water:fertilizer), data=pumpkins)
summary(lme_pumpkins)

## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
##    Data: pumpkins
## 
## REML criterion at convergence: 369.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.64930 -0.54801  0.03808  0.70427  1.69905 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  plot:water:fertilizer (Intercept) 0.06128  0.2476  
##  Residual                          1.23871  1.1130  
## Number of obs: 120, groups:  plot:water:fertilizer, 24
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                  20.00462    0.27795  71.972
## waterwater                    0.02462    0.39308   0.063
## fertilizerlow               -12.08324    0.39308 -30.740
## fertilizermedium             -6.13599    0.39308 -15.610
## waterwater:fertilizerlow     -1.84651    0.55590  -3.322
## waterwater:fertilizermedium   2.67808    0.55590   4.818
## 
## Correlation of Fixed Effects:
##                (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater     -0.707                                      
## fertilizrlw    -0.707  0.500                               
## fertilzrmdm    -0.707  0.500  0.500                        
## wtrwtr:frtlzrl  0.500 -0.707 -0.707  -0.354                
## wtrwtr:frtlzrm  0.500 -0.707 -0.354  -0.707   0.500

… with `brms`:

Here’s the “same thing” with brms:

brms_pumpkins <- brm( weight ~ water * fertilizer + (1|plot:water:fertilizer), data=pumpkins)

## Compiling Stan program...

## Start sampling

brms_pumpkins <- brm( weight ~ water * fertilizer + (1|plot:water:fertilizer), data=pumpkins)

summary(brms_pumpkins)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer) 
##    Data: pumpkins (Number of observations: 120) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~plot:water:fertilizer (Number of levels: 24) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.28      0.17     0.01     0.65 1.00      871      906
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                      20.00      0.31    19.40    20.61 1.00     1567     1676
## waterwater                      0.03      0.43    -0.80     0.90 1.00     1559     1665
## fertilizerlow                 -12.08      0.43   -12.91   -11.23 1.00     1813     1971
## fertilizermedium               -6.13      0.42    -6.96    -5.32 1.00     1677     1882
## waterwater:fertilizerlow       -1.86      0.60    -3.01    -0.72 1.00     1642     1996
## waterwater:fertilizermedium     2.67      0.59     1.57     3.83 1.00     1594     1646
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.13      0.08     0.98     1.29 1.00     3246     2862
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Quick comparison:

summary(brms_pumpkins)

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer) 
##    Data: pumpkins (Number of observations: 120) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~plot:water:fertilizer (Number of levels: 24) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.28      0.17     0.01     0.65 1.00      871      906
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                      20.00      0.31    19.40    20.61 1.00     1567     1676
## waterwater                      0.03      0.43    -0.80     0.90 1.00     1559     1665
## fertilizerlow                 -12.08      0.43   -12.91   -11.23 1.00     1813     1971
## fertilizermedium               -6.13      0.42    -6.96    -5.32 1.00     1677     1882
## waterwater:fertilizerlow       -1.86      0.60    -3.01    -0.72 1.00     1642     1996
## waterwater:fertilizermedium     2.67      0.59     1.57     3.83 1.00     1594     1646
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.13      0.08     0.98     1.29 1.00     3246     2862
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

summary(lme_pumpkins)

## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
##    Data: pumpkins
## 
## REML criterion at convergence: 369.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.64930 -0.54801  0.03808  0.70427  1.69905 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  plot:water:fertilizer (Intercept) 0.06128  0.2476  
##  Residual                          1.23871  1.1130  
## Number of obs: 120, groups:  plot:water:fertilizer, 24
## 
## Fixed effects:
##                              Estimate Std. Error t value
## (Intercept)                  20.00462    0.27795  71.972
## waterwater                    0.02462    0.39308   0.063
## fertilizerlow               -12.08324    0.39308 -30.740
## fertilizermedium             -6.13599    0.39308 -15.610
## waterwater:fertilizerlow     -1.84651    0.55590  -3.322
## waterwater:fertilizermedium   2.67808    0.55590   4.818
## 
## Correlation of Fixed Effects:
##                (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater     -0.707                                      
## fertilizrlw    -0.707  0.500                               
## fertilzrmdm    -0.707  0.500  0.500                        
## wtrwtr:frtlzrl  0.500 -0.707 -0.707  -0.354                
## wtrwtr:frtlzrm  0.500 -0.707 -0.354  -0.707   0.500

Your turn

Try out:

launch_shinystan(brms_pumpkins)
conditional_effects(brms_pumpkins)
pp_check(brms_pumpkins, type='scatter')