\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\MVN}{MVN} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Robust linear models

Peter Ralph

Advanced Biological Statistics

Robustness

Statistical robustness:

Huber, Robust statistics, p.1

Huber, Robust Statistics.

Methods for dealing with “outliers”:

  1. (Look at them carefully and) remove them (if appropriate).

  2. Use methods that are robust to the presence of outliers.

Fitting linear models, robustly

Standard “least-squares” fits

\[\begin{aligned} \hat y_i &= b_0 + b_1 x_i \\ y_i &\sim \Normal(\hat y_i, \sigma^2) . \end{aligned}\]

Simulate data:

truth <- list(b0=1.0, b1=2.0, sigma=0.5)
n <- 200
x <- rnorm(n, mean=0, sd=1)
y <- ( truth$b0 + truth$b1 * x 
        + rnorm(n, mean=0, sd=truth$sigma) )

plot of chunk r simdata

Least-squares fit:

system.time( slr <- lm(y ~ x) )
##    user  system elapsed 
##   0.005   0.000   0.005
summary(slr)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.43120 -0.29747 -0.02625  0.31638  1.32033 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.98628    0.03459   28.51   <2e-16 ***
## x            2.03262    0.03719   54.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4892 on 198 degrees of freedom
## Multiple R-squared:  0.9378, Adjusted R-squared:  0.9375 
## F-statistic:  2987 on 1 and 198 DF,  p-value: < 2.2e-16

with brms (recall Cauchy is Student’s T with df=1)

system.time(
    brmslr <- brm(y ~ x, data=data.frame(x=x, y=y))
)
## Compiling Stan program...
## Start sampling
##    user  system elapsed 
##  41.596   3.932  49.135
print(brmslr)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ x 
##    Data: data.frame(x = x, y = y) (Number of observations: 200) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.99      0.03     0.92     1.06 1.00     4178     2650
## x             2.03      0.04     1.96     2.11 1.00     3816     2728
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.49      0.02     0.45     0.55 1.00     3224     2428
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Cauchy noise?

Relative axon growth for neurons after \(x\) hours:

truth <- list(b0=1.0, b1=2.0, sigma=0.5)
n <- 200
x <- rnorm(n, mean=0, sd=1)
y <- ( truth$b0 + truth$b1 * x 
        + rcauchy(n, location=0, 
                  scale=truth$sigma) )

plot of chunk r simdata_rr

Compare:

plot of chunk r compare

Least-squares fit:

summary(slr2 <- lm(y ~ x))
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -65.973  -0.877   0.120   1.312  66.074 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.8918     0.5740   1.554    0.122
## x             0.7230     0.6172   1.171    0.243
## 
## Residual standard error: 8.117 on 198 degrees of freedom
## Multiple R-squared:  0.006883,   Adjusted R-squared:  0.001868 
## F-statistic: 1.372 on 1 and 198 DF,  p-value: 0.2428

with brms

system.time(
    brmsrr <- brm(y ~ x, data=data.frame(x=x, y=y), family=student())
)
## Compiling Stan program...
## Start sampling
##    user  system elapsed 
##  42.218   2.756  47.414
print(brmsrr)
##  Family: student 
##   Links: mu = identity; sigma = identity; nu = identity 
## Formula: y ~ x 
##    Data: data.frame(x = x, y = y) (Number of observations: 200) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.03      0.06     0.91     1.14 1.00     3847     2377
## x             2.03      0.07     1.89     2.15 1.00     3910     2827
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.60      0.06     0.49     0.74 1.00     3530     2682
## nu        1.09      0.08     1.00     1.29 1.00     2246     1586
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Inferred lines, with uncertainty

plot of chunk r plot_simdata_rr

Same, zoomed in

plot of chunk r plot_simdata_rr_again

Posterior distributions

mcmc_hist(brmsrr, pars=c("b_Intercept", "b_x", "sigma", "nu"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk r posts

Why does it work?

The Cauchy distribution has “heavy tails”: extreme values are much more likely.

So, a point that’s far from the line is less surprising,

and the line with the highest likelihood isn’t as strongly affected by outliers.

plot of chunk r dx

pp_check

How would we have known that outliers were causing a problem?

Suppose we first did a non-robust fit:

brmslm <- brm(y ~ x, data=data.frame(x=x, y=y), family=gaussian())
## Compiling Stan program...
## Start sampling
pp_check(brmslm, type='hist')
## Using 10 posterior draws for ppc type 'hist' by default.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk r pp_scatter

Now, after switching to the robust fit:

brmsrr <- brm(y ~ x, data=data.frame(x=x, y=y), family=student())
pp_check(brmsrr, type='hist')
## Using 10 posterior draws for ppc type 'hist' by default.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot of chunk r pp_scatter2