Peter Ralph
Advanced Biological Statistics
The brms
package lets you
fit hierarchical models using Stan
with mixed-model syntax!!!
# e.g.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = poisson(link='log'))
# or
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = student(link='identity'))
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = gaussian(link='identity'))
formula: almost just like lme4data: must contain all the variablesfamily: distribution of responselink: connects mean to linear predictorThere are several classes of parameter in a brms model:
b : the population-level (or, fixed)
effectssd : the standard deviations of group-level
(or, random) effectssigma for the
GaussianExamples:
b_x : the slope of x :
class="b", coef="x"sd_f : the SD of effects for levels of f :
class="sd", coef="f"Pass a vector of “priors”, specified by
    set_prior(prior, class="b", ...)
where prior is some valid Stan code.
brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = gaussian(link='identity'),
    prior=c(set_prior("normal(0, 5)", class="b"),
            set_prior("cauchy(0, 1)", class="sd", coef="f")))
Some example data:
xy <- data.frame(x = rnorm(100),
                 y = rexp(100),
                 f = factor(sample(letters[1:3], 100, replace=TRUE)))
xy$z <- xy$x + as.numeric(xy$f) * xy$y + rnorm(100, sd=0.1)The formula:
Default:
##                   prior     class      coef group resp dpar nlpar bound       source
##                  (flat)         b                                            default
##                  (flat)         b         x                             (vectorized)
##                  (flat)         b         y                             (vectorized)
##                  lkj(1)       cor                                            default
##                  lkj(1)       cor               f                       (vectorized)
##  student_t(3, 1.6, 2.5) Intercept                                            default
##    student_t(3, 0, 2.5)        sd                                            default
##    student_t(3, 0, 2.5)        sd               f                       (vectorized)
##    student_t(3, 0, 2.5)        sd Intercept     f                       (vectorized)
##    student_t(3, 0, 2.5)        sd         y     f                       (vectorized)
##    student_t(3, 0, 2.5)     sigma                                            default
## Compiling Stan program...
## Start sampling
## Warning: There were 349 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: There were 29 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: Examine the pairs() plot to diagnose sampling problems
## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
## Warning: There were 349 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: z ~ x + y + (1 + y | f) 
##    Data: xy (Number of observations: 100) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~f (Number of levels: 3) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.07      0.17     0.00     0.50 1.02     1012     1019
## sd(y)                1.18      0.41     0.56     2.10 1.02      371     1768
## cor(Intercept,y)    -0.06      0.54    -0.94     0.91 1.01     1155     1517
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.02      0.11    -0.09     0.13 1.00     1683     1385
## x             1.00      0.01     0.98     1.03 1.01      256     2378
## y             1.91      0.64     0.54     3.13 1.02      484     1063
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.10      0.01     0.09     0.12 1.00     2500     2372
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Or…
launch_shinystan(the_fit)
Summaries of, or samples from, the posteriors of:
fixef( ): “fixed” effectsranef( ): “random” effectsfitted( ): posterior distribution of mean
response (see posterior_epred)predict( ): posterior distribution of actual responses
(see posterior_predict)hypothesis( ): get posterior distributions of
functions of various parameters (e.g., difference between two
classes)conditional_effects( ): effect of one predictor
conditioned on values of othersparnames( ): list of parameter namespp_check( ): compare response distribution to posterior
predictive simulationsloo( ) leave-one-out crossvalidation for model
comparisonbayes_R2( ): Bayesian \(r^2\)help(brmsformula)help(brmsfamily)
(but note can use those in help(family) also)help(set_prior)
and also check what can have a prior with get_prior( )stancode(the_fit)
(and standata(the_fit))loo( )\[\begin{aligned} Y &\sim \text{Family}(\text{mean}=\mu) \\ \mu &= \text{link}(X \beta) . \end{aligned}\]
We can ask about:
\[\begin{aligned} Y &\sim \text{Family}(\text{mean}=\mu) \\ \mu &= \text{link}(X \beta + Z u) . \end{aligned}\]
We can ask about:
… with or without the group-level (random) effects, \(u\).
batting <- read.csv("data/BattingAveragePlus.csv", header=TRUE, stringsAsFactors=TRUE)
batting$scaled_height <- (batting$height - mean(batting$height))/sd(batting$height)
batting$scaled_weight <- (batting$weight - mean(batting$weight))/sd(batting$weight)
bb_fit <- brm(
      Hits  | trials(AtBats) ~ 0 + scaled_weight + scaled_height + PriPos + (1 | PriPos:Player),
      data = batting,
      family = "binomial",
      prior = c(prior(normal(0, 5), class = b),
                prior(normal(0, 5), class = sd)),
      iter = 2000, chains = 3
)## Compiling Stan program...
## Start sampling
Example: Is height associated with batting average? Which positions tend to be better batters?
Translated: What’s the posterior distribution of the effect of height on logit batting average?
Tools: (all have pars= arguments)
summary( )mcmc_hist( )mcmc_intervals( )extract( )##  Family: binomial 
##   Links: mu = logit 
## Formula: Hits | trials(AtBats) ~ 0 + scaled_weight + scaled_height + PriPos + (1 | PriPos:Player) 
##    Data: batting (Number of observations: 886) 
##   Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 3000
## 
## Group-Level Effects: 
## ~PriPos:Player (Number of levels: 886) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.14      0.01     0.12     0.16 1.00     1056     1204
## 
## Population-Level Effects: 
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## scaled_weight         0.02      0.01    -0.01     0.04 1.00     1006     1645
## scaled_height        -0.01      0.01    -0.03     0.02 1.00      853     1233
## PriPos1stBase        -1.08      0.03    -1.14    -1.04 1.00     1384     1884
## PriPos2ndBase        -1.09      0.03    -1.15    -1.04 1.00     1511     1706
## PriPos3rdBase        -1.06      0.02    -1.10    -1.01 1.00     1416     2031
## PriPosCatcher        -1.16      0.03    -1.21    -1.10 1.00     1445     1783
## PriPosCenterField    -1.05      0.03    -1.10    -0.99 1.00     1262     1607
## PriPosLeftField      -1.10      0.02    -1.14    -1.05 1.00     1826     2217
## PriPosPitcher        -1.91      0.04    -2.00    -1.83 1.00     3201     2209
## PriPosRightField     -1.05      0.03    -1.11    -1.00 1.01     1216     1712
## PriPosShortstop      -1.10      0.03    -1.16    -1.04 1.00     1339     1966
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Hypothesis Tests for class b:
##            Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (scaled_weight) > 0     0.02      0.01    -0.01     0.04       7.26      0.88     
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
## Hypothesis Tests for class b:
##            Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (scaled_height) < 0    -0.01      0.01    -0.03     0.01       2.58      0.72     
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Example: What is the mean batting average of each position, as a function of weight?
Tools:
conditional_effects( ): means in abstract
conditionsfitted( ): expected values, summarized (optionally, for
new data)posterior_epred( ): draws from the posterior mean
(optionally, for new data)conditional_effects for average-height-and-weight:
## Setting all 'trials' variables to 1 by default if not specified otherwise.
conditional_effects by weight at average height:
## Setting all 'trials' variables to 1 by default if not specified otherwise.
conditional_effects for shortstops by weight that are
one SD above average height:
conditional_effects(bb_fit, effects="scaled_weight",
                    conditions=data.frame(PriPos="Shortstop", scaled_height=1))## Setting all 'trials' variables to 1 by default if not specified otherwise.
The fitted( ) values: for the original data
bb_fitted <- fitted(bb_fit)
ggplot(cbind(batting, bb_fitted)) +
       geom_segment(aes(x=Q2.5, xend=Q97.5, y=Hits, yend=Hits), col='red') +
       geom_point(aes(x=Estimate, y=Hits)) +
       geom_abline(intercept=0, slope=1) +
       ylab("actual hits") + xlab("predicted hits") + coord_fixed()posterior_epred( ): posterior distribution of expected
values
Expected number of hits out of 50 at-bats for a shortstop of average weight that is 1.5 SD above average height?
pp <- posterior_epred(bb_fit,
        newdata=data.frame(
            PriPos="Shortstop",
            scaled_weight=0,
            scaled_height=1.5,
            AtBats=50),
        re_formula=NULL,
        allow_new_levels=TRUE)
head(pp)##           [,1]
## [1,] 13.484300
## [2,] 11.600295
## [3,] 14.259553
## [4,]  9.274168
## [5,] 13.549492
## [6,] 12.882218
Compare: same thing but re_formula=NA:
pp2 <- posterior_epred(bb_fit,
        newdata=data.frame(
            PriPos="Shortstop",
            scaled_weight=0,
            scaled_height=1.5,
            AtBats=50),
        re_formula=NA
)
head(pp2)##          [,1]
## [1,] 12.72004
## [2,] 12.51611
## [3,] 12.96435
## [4,] 12.25001
## [5,] 13.03064
## [6,] 12.54157
ggplot(data.frame(hits=c(pp, pp2), new_re=rep(c("include group effects", "no group effects"), each=length(pp)))) + geom_histogram(aes(x=hits), bins=32) + facet_wrap(~ new_re)Suppose Miguel Cairo, and Franklin Gutierrez go up to bat 150 more times, how many hits will they get?
(mc <- subset(batting, Player %in% c("Miguel Cairo", "Franklin Gutierrez"))[, c("Player", "PriPos", "Hits", "AtBats", "scaled_height", "scaled_weight" )])##                 Player       PriPos Hits AtBats scaled_height scaled_weight
## 329 Franklin Gutierrez Center Field   39    150     0.2568063    -0.5613294
## 641       Miguel Cairo     1st Base   28    150    -0.1757610     0.6614721
Tools:
predict( ): summaryposterior_predict( ): get samples##                 Player       PriPos Hits AtBats scaled_height scaled_weight Estimate Est.Error   Q2.5 Q97.5
## 329 Franklin Gutierrez Center Field   39    150     0.2568063    -0.5613294 38.71067  6.295991 27.000    52
## 641       Miguel Cairo     1st Base   28    150    -0.1757610     0.6614721 34.73467  5.997743 23.975    47
fitted( ) vs predict( )predict always has greater uncertainty.
##                 Player       PriPos Hits AtBats scaled_height scaled_weight predict.Estimate predict.Est.Error predict.Q2.5 predict.Q97.5 fitted.Estimate fitted.Est.Error fitted.Q2.5 fitted.Q97.5
## 329 Franklin Gutierrez Center Field   39    150     0.2568063    -0.5613294         38.85367          6.241306           27            52        38.86311         3.263611    32.44602     45.44312
## 641       Miguel Cairo     1st Base   28    150    -0.1757610     0.6614721         34.66067          5.987665           23            47        34.81707         3.103223    29.00026     41.07867
\[\begin{aligned} Z_i &\sim \Binom(N_i, \theta_i) \\ \theta_i &= \logit(\beta_{p_i} + u_i) \\ u_i &\sim \Normal(0, \sigma^2) \end{aligned}\]
where for the \(i^\text{th}\) player,
##           Player   PriPos Hits AtBats scaled_height scaled_weight
## 20 Albert Pujols 1st Base  173    607     0.6893737      1.395153
How would we use the posterior to summarize:
Mean batting average of 1st base players of average weight and height?
firstpost <- posterior_epred(
    bb_fit,
    newdata=data.frame(
        PriPos="1st Base",
        scaled_weight=0,
        scaled_height=0,
        AtBats=1
    ),
    re_formula=NA
)
hist(firstpost, breaks=40, xlab="batting average")Range of batting averages of actual 1st base players? We’ll get the posterior distributions of the min and max batting averages.
firstdf <- subset(batting, PriPos=="1st Base")
firstdf$AtBats <- 1
everyone_post <- posterior_epred(
    bb_fit,
    newdata=firstdf
)
minmaxpost <- data.frame(
    min=rowMins(everyone_post),
    max=rowMaxs(everyone_post)
)
layout(t(1:2))
hist(minmaxpost[,1], breaks=40, title='min batting avg')## Warning in plot.window(xlim, ylim, "", ...): "title" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "title" is not a graphical parameter
## Warning in axis(1, ...): "title" is not a graphical parameter
## Warning in axis(2, ...): "title" is not a graphical parameter
## Warning in plot.window(xlim, ylim, "", ...): "title" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "title" is not a graphical parameter
## Warning in axis(1, ...): "title" is not a graphical parameter
## Warning in axis(2, ...): "title" is not a graphical parameter
Albert Pujols’ batting average?
albert <- subset(batting,
                   Player=="Albert Pujols"
)
albert$AtBats <- 1
albert_post <- posterior_epred(
    bb_fit,
    newdata=albert
)
hist(albert_post, breaks=40,
     main="Albert Pujols batting avg", xlab="batting avg")How many hits Albert Pujol would get out of 100 at bats?
albert$AtBats <- 100
albert_post <- posterior_predict(
    bb_fit,
    newdata=albert
)
hist(albert_post, breaks=40,
     main="Albert Pujols hits out of 100", xlab="batting avg")pp_checkQuestion: does our model fit the data?
Possible answer: gee, I dunno, let’s simulate from it and see?
Fit a model.
Draw a set of parameters from the posterior distribution.
With these, simulate a new data set.
Do this a few times, and compare the results to the original dataset.
brms lets you do this with the
pp_check(brms_fit, type='xyz') method
See the docs for options.
Let’s first go back to the pumpkin data from Week 5:
lme4:Then, we fit a mixed model with lme4:
lme_pumpkins <- lmer( weight ~ water * fertilizer + (1|plot:water:fertilizer), data=pumpkins)
summary(lme_pumpkins)## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
##    Data: pumpkins
## 
## REML criterion at convergence: 420.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.64930 -0.54801  0.03808  0.70427  1.69905 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  plot:water:fertilizer (Intercept) 0.09576  0.3094  
##  Residual                          1.93549  1.3912  
## Number of obs: 120, groups:  plot:water:fertilizer, 24
## 
## Fixed effects:
##                             Estimate Std. Error t value
## (Intercept)                  6.00577    0.34744  17.286
## waterwater                   0.03078    0.49135   0.063
## fertilizerlow               -1.10405    0.49135  -2.247
## fertilizermedium            -0.66999    0.49135  -1.364
## waterwater:fertilizerlow     1.19187    0.69488   1.715
## waterwater:fertilizermedium  1.84760    0.69488   2.659
## 
## Correlation of Fixed Effects:
##                (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater     -0.707                                      
## fertilizrlw    -0.707  0.500                               
## fertilzrmdm    -0.707  0.500  0.500                        
## wtrwtr:frtlzrl  0.500 -0.707 -0.707  -0.354                
## wtrwtr:frtlzrm  0.500 -0.707 -0.354  -0.707   0.500
brms:Here’s the “same thing” with brms:
## Compiling Stan program...
## Start sampling
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer) 
##    Data: pumpkins (Number of observations: 120) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~plot:water:fertilizer (Number of levels: 24) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.35      0.22     0.02     0.83 1.01      822     1321
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                       6.01      0.38     5.27     6.76 1.00     1421     1547
## waterwater                      0.02      0.54    -1.02     1.08 1.00     1214     1528
## fertilizerlow                  -1.14      0.56    -2.23    -0.09 1.00     1413     1586
## fertilizermedium               -0.68      0.55    -1.80     0.40 1.00     1420     1533
## waterwater:fertilizerlow        1.22      0.78    -0.26     2.82 1.00     1169     1109
## waterwater:fertilizermedium     1.86      0.76     0.42     3.35 1.00     1203     1631
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.40      0.10     1.23     1.61 1.00     3323     2678
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer) 
##    Data: pumpkins (Number of observations: 120) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~plot:water:fertilizer (Number of levels: 24) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.35      0.22     0.02     0.83 1.01      822     1321
## 
## Population-Level Effects: 
##                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept                       6.01      0.38     5.27     6.76 1.00     1421     1547
## waterwater                      0.02      0.54    -1.02     1.08 1.00     1214     1528
## fertilizerlow                  -1.14      0.56    -2.23    -0.09 1.00     1413     1586
## fertilizermedium               -0.68      0.55    -1.80     0.40 1.00     1420     1533
## waterwater:fertilizerlow        1.22      0.78    -0.26     2.82 1.00     1169     1109
## waterwater:fertilizermedium     1.86      0.76     0.42     3.35 1.00     1203     1631
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.40      0.10     1.23     1.61 1.00     3323     2678
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## Linear mixed model fit by REML ['lmerMod']
## Formula: weight ~ water * fertilizer + (1 | plot:water:fertilizer)
##    Data: pumpkins
## 
## REML criterion at convergence: 420.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.64930 -0.54801  0.03808  0.70427  1.69905 
## 
## Random effects:
##  Groups                Name        Variance Std.Dev.
##  plot:water:fertilizer (Intercept) 0.09576  0.3094  
##  Residual                          1.93549  1.3912  
## Number of obs: 120, groups:  plot:water:fertilizer, 24
## 
## Fixed effects:
##                             Estimate Std. Error t value
## (Intercept)                  6.00577    0.34744  17.286
## waterwater                   0.03078    0.49135   0.063
## fertilizerlow               -1.10405    0.49135  -2.247
## fertilizermedium            -0.66999    0.49135  -1.364
## waterwater:fertilizerlow     1.19187    0.69488   1.715
## waterwater:fertilizermedium  1.84760    0.69488   2.659
## 
## Correlation of Fixed Effects:
##                (Intr) wtrwtr frtlzrl frtlzrm wtrwtr:frtlzrl
## waterwater     -0.707                                      
## fertilizrlw    -0.707  0.500                               
## fertilzrmdm    -0.707  0.500  0.500                        
## wtrwtr:frtlzrl  0.500 -0.707 -0.707  -0.354                
## wtrwtr:frtlzrm  0.500 -0.707 -0.354  -0.707   0.500
Try out:
launch_shinystan(brms_pumpkins)
conditional_effects(brms_pumpkins)
pp_check(brms_pumpkins, type='scatter')