Sparsifying priors and variable selection

Peter Ralph

4 February 2020 – Advanced Biological Statistics

Variable selection

Example data

from Efron, Hastie, Johnstone, & Tibshirani

library(lars)
data(diabetes)
class(diabetes$x2) <- "matrix"

diabetes                 package:lars                  R Documentation

Blood and other measurements in diabetics

Description:

     The ‘diabetes’ data frame has 442 rows and 3 columns. These are
     the data used in the Efron et al "Least Angle Regression" paper.

Format:

     This data frame contains the following columns:

     x a matrix with 10 columns

     y a numeric vector

     x2 a matrix with 64 columns

The dataset has

442 diabetes patients
10 main variables: age, gender, body mass index, average blood pressure (map), and six blood serum measurements (tc, ldl, hdl, tch, ltg, glu)
45 interactions, e.g. age:ldl
9 quadratic effects, e.g. age^2
measure of disease progression taken one year later: y

plot of chunk show_cors

cor(cbind(diabetes$x, y=diabetes$y))

##        age    sex    bmi   map    tc   ldl    hdl   tch   ltg   glu      y
## age  1.000  0.174  0.185  0.34 0.260  0.22 -0.075  0.20  0.27  0.30  0.188
## sex  0.174  1.000  0.088  0.24 0.035  0.14 -0.379  0.33  0.15  0.21  0.043
## bmi  0.185  0.088  1.000  0.40 0.250  0.26 -0.367  0.41  0.45  0.39  0.586
## map  0.335  0.241  0.395  1.00 0.242  0.19 -0.179  0.26  0.39  0.39  0.441
## tc   0.260  0.035  0.250  0.24 1.000  0.90  0.052  0.54  0.52  0.33  0.212
## ldl  0.219  0.143  0.261  0.19 0.897  1.00 -0.196  0.66  0.32  0.29  0.174
## hdl -0.075 -0.379 -0.367 -0.18 0.052 -0.20  1.000 -0.74 -0.40 -0.27 -0.395
## tch  0.204  0.332  0.414  0.26 0.542  0.66 -0.738  1.00  0.62  0.42  0.430
## ltg  0.271  0.150  0.446  0.39 0.516  0.32 -0.399  0.62  1.00  0.46  0.566
## glu  0.302  0.208  0.389  0.39 0.326  0.29 -0.274  0.42  0.46  1.00  0.382
## y    0.188  0.043  0.586  0.44 0.212  0.17 -0.395  0.43  0.57  0.38  1.000

Crossvalidation plan

Put aside 20% of the data for testing.
Refit the model.
Predict the test data; compute \[\begin{aligned} S = \sqrt{\frac{1}{M} \sum_{k=1}^M (\hat y_i - y_i)^2} \end{aligned}\]

Repeat for the other four 20%s.
Compare.

Crossvalidation

First let’s split the data into testing and training just once:

test_indices <- (rbinom(length(diabetes$y), size=1, prob=0.2) == 1)
test_d <- cbind(data.frame(y=diabetes$y[test_indices]),
                diabetes$x2[test_indices,])
training_d <- cbind(data.frame(y=diabetes$y[!test_indices]),
                diabetes$x2[!test_indices,])

Ordinary linear regression

ols <- lm(y ~ ., data=training_d)
summary(ols)

## 
## Call:
## lm(formula = y ~ ., data = training_d)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -144.3  -32.5   -1.1   30.8  150.4 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   150.688      2.952   51.04  < 2e-16 ***
## age            84.949     76.868    1.11  0.27004    
## sex          -269.634     74.608   -3.61  0.00036 ***
## bmi           472.782     95.168    4.97  1.2e-06 ***
## map           360.868     83.425    4.33  2.1e-05 ***
## tc          -5344.136  61836.313   -0.09  0.93119    
## ldl          4723.000  54348.535    0.09  0.93081    
## hdl          1680.538  23107.326    0.07  0.94207    
## tch           -85.184    310.280   -0.27  0.78387    
## ltg          2350.565  20327.550    0.12  0.90802    
## glu            89.653     82.363    1.09  0.27730    
## `age^2`        66.327     81.718    0.81  0.41767    
## `bmi^2`       -15.807     98.076   -0.16  0.87207    
## `map^2`       -52.390     81.903   -0.64  0.52291    
## `tc^2`       4501.315   7881.979    0.57  0.56839    
## `ldl^2`      1315.348   5909.806    0.22  0.82403    
## `hdl^2`      1030.418   1782.583    0.58  0.56369    
## `tch^2`      1153.506    714.697    1.61  0.10764    
## `ltg^2`      1092.080   1792.281    0.61  0.54280    
## `glu^2`       128.336    105.649    1.21  0.22547    
## `age:sex`     148.196     90.659    1.63  0.10323    
## `age:bmi`       0.261     91.854    0.00  0.99773    
## `age:map`      20.594     92.257    0.22  0.82352    
## `age:tc`     -381.435    724.342   -0.53  0.59889    
## `age:ldl`     210.218    572.335    0.37  0.71367    
## `age:hdl`     200.905    332.316    0.60  0.54595    
## `age:tch`      61.393    261.202    0.24  0.81435    
## `age:ltg`     226.963    253.924    0.89  0.37217    
## `age:glu`     123.565     97.013    1.27  0.20381    
## `sex:bmi`     151.251     90.586    1.67  0.09608 .  
## `sex:map`      34.898     92.578    0.38  0.70649    
## `sex:tc`      710.474    742.200    0.96  0.33925    
## `sex:ldl`    -583.090    593.504   -0.98  0.32671    
## `sex:hdl`     -89.042    339.134   -0.26  0.79308    
## `sex:tch`     -61.988    232.199   -0.27  0.78969    
## `sex:ltg`    -210.102    273.395   -0.77  0.44283    
## `sex:glu`       2.214     83.695    0.03  0.97891    
## `bmi:map`     232.783    105.268    2.21  0.02781 *  
## `bmi:tc`     -449.811    783.837   -0.57  0.56652    
## `bmi:ldl`     449.714    655.611    0.69  0.49331    
## `bmi:hdl`     123.457    381.237    0.32  0.74630    
## `bmi:tch`    -132.984    266.229   -0.50  0.61781    
## `bmi:ltg`     132.106    300.483    0.44  0.66053    
## `bmi:glu`      88.775    100.372    0.88  0.37720    
## `map:tc`      164.889    829.983    0.20  0.84267    
## `map:ldl`     -35.365    692.990   -0.05  0.95934    
## `map:hdl`     -84.927    384.110   -0.22  0.82517    
## `map:tch`    -114.154    239.770   -0.48  0.63437    
## `map:ltg`       3.840    326.807    0.01  0.99063    
## `map:glu`    -244.481    107.705   -2.27  0.02396 *  
## `tc:ldl`    -4837.758  13111.399   -0.37  0.71242    
## `tc:hdl`    -2183.968   4297.196   -0.51  0.61169    
## `tc:tch`    -2109.486   1982.692   -1.06  0.28825    
## `tc:ltg`    -2127.476  13625.947   -0.16  0.87604    
## `tc:glu`      950.939    944.060    1.01  0.31465    
## `ldl:hdl`     750.673   3596.170    0.21  0.83480    
## `ldl:tch`     685.721   1687.047    0.41  0.68471    
## `ldl:ltg`    1301.031  11332.178    0.11  0.90868    
## `ldl:glu`    -997.177    828.415   -1.20  0.22970    
## `hdl:tch`    1423.428   1141.877    1.25  0.21358    
## `hdl:ltg`     579.689   4796.146    0.12  0.90388    
## `hdl:glu`    -207.188    418.504   -0.50  0.62094    
## `tch:ltg`     231.583    710.805    0.33  0.74481    
## `tch:glu`     195.247    265.217    0.74  0.46223    
## `ltg:glu`    -262.969    369.010   -0.71  0.47666    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 53 on 284 degrees of freedom
## Multiple R-squared:  0.627,  Adjusted R-squared:  0.543 
## F-statistic: 7.47 on 64 and 284 DF,  p-value: <2e-16

ols_pred <- predict(ols, newdata=test_d)
ols_mse <- sqrt(mean((ols_pred - test_d$y)^2))

With ordinary linear regression, we got a root-mean-square-prediction-error of 61.35 (on the test data), compared to a root-mean-square-error of 47.91 for the training data.

This suggests there’s some overfitting going on.

plot(training_d$y, predict(ols), xlab="true values", ylab="OLS predicted", main="training data", pch=20)
abline(0,1)
plot(test_d$y, ols_pred, xlab="true values", ylab="OLS predicted", main="test data", pch=20)
abline(0,1)

plot of chunk plot_ols

A sparsifying prior

We have a lot of predictors: 64 of them. A good guess is that only a few are really useful. So, we can put a sparsifying prior on the coefficients, i.e., \(\beta\)s in \[\begin{aligned} y = \beta_0 + \beta_1 x_1 + \cdots \beta_n x_n + \epsilon \end{aligned}\]

Crossvalidation: the effect of spurious variables

Who says we don’t do experiments?

Simulate data with y ~ a + b x[1] + c x[2], and fit a linear model.
Measure in-sample and out-of-sample prediction error.
Add spurious variables, and report the above as a function of number of variables.

Simulate data with many, weakly predictive explanatory variables.
Compare different methods for fitting linear models.

Basic data: \(y = a + b_1 x_1 + b_2 x_2 + \epsilon\).

N <- 500
df <- data.frame(x1 = rnorm(N),
                 x2 = runif(N))
params <- list(intercept = 2.0,
               x1 = 7.0,
               x2 = -3.0,
               sigma = 0.1 * 10)
pred_y <- params$intercept + params$x1 * df$x1 + params$x2 * df$x2 
df$y <- rnorm(N, mean=pred_y, sd=params$sigma)
pairs(df)

plot of chunk in_class1

Crossvalidation error function

kfold <- function (K, df) {
    N <- nrow(df)
    Kfold <- sample(rep(1:K, N/K))

    do_xval <- function (k) {
        the_lm <- lm(y ~ ., data=df, subset=(Kfold != k))
        train_error <- sqrt(mean(resid(the_lm)^2))
        test_y <- df$y[Kfold == k]
        test_error <- sqrt(mean( (test_y - predict(the_lm, newdata=subset(df, Kfold==k)))^2 ))
        return(c('test'=test_error, 'train'=train_error))
    }

    results <- sapply(1:K, do_xval)
    return(results)
}

Add noise

K <- 10
max_M <- 300
do_m <- floor(seq(from=2, to=max_M-1, length.out=40))
all_results <- matrix(NA, nrow=length(do_m) + 1, ncol=2)
first_results <- rowMeans(kfold(K, df))
all_results[1,] <- first_results
colnames(all_results) <- names(first_results)

noise_df <- matrix(rnorm(N * (N-2)), nrow=N)
colnames(noise_df) <- paste0('z', 1:ncol(noise_df))
new_df <- cbind(df, noise_df)
for (j in seq_along(do_m)) {
    m <- do_m[j]
    all_results[j,] <- rowMeans(kfold(K, new_df[,1:(m+1)]))
}

Results

matplot(c(2, do_m), all_results, type='l', xlab='number of variables', ylab='root mean square error')
legend("topleft", lty=1, col=1:2, legend=colnames(all_results))

plot of chunk in_class5

Sparseness and scale mixtures

Encouraging sparseness

Suppose we do regression with a large number of predictor variables.

The resulting coefficients are sparse if most are zero.

The idea is to “encourage” all the coefficients to be zero, unless they really want to be nonzero, in which case we let them be whatever they want.

This tends to discourage overfitting.

The idea is to “encourage” all the coefficients to be zero, unless they really want to be nonzero, in which case we let them be whatever they want.

To do this, we want a prior which is very peak-ey at zero but flat away from zero (“spike-and-slab”).

Compare the Normal

\[\begin{aligned} X \sim \Normal(0,1) \end{aligned}\]

to the “exponential scale mixture of Normals”,

\[\begin{aligned} X &\sim \Normal(0,\sigma) \\ \sigma &\sim \Exp(1) . \end{aligned}\]

plot of chunk scale_mixtures

Why use a scale mixture?

Lets the data choose the appropriate scale of variation.
Weakly encourages \(\sigma\) to be small: so, as much variation as possible is explained by signal instead of noise.
Gets you a prior that is more peaked at zero and flatter otherwise.

Implementation

Note that

\[\begin{aligned} \beta &\sim \Normal(0,\sigma) \\ \sigma &\sim \Exp(1) . \end{aligned}\]

is equivalent to

\[\begin{aligned} \beta &= \sigma \gamma \\ \gamma &\sim \Normal(0,1) \\ \sigma &\sim \Exp(1) . \end{aligned}\]

parameters {
    real beta;
    real<lower=0> sigma;
}
model {
    beta ~ normal(0, sigma);
    sigma ~ exponential(1);
}

is equivalent to

parameters {
    real gamma;
    real<lower=0> sigma;
}
transformed parameters {
    real beta;
    beta = gamma * sigma;
}
model {
    gamma ~ normal(0, 1);
    sigma ~ exponential(1);
}

The second version is better for Stan.

Why is it better?

parameters {
    real beta;
    real<lower=0> sigma;
}
model {
    beta ~ normal(0, sigma);
    sigma ~ exponential(1);
}

In the first, the optimal step size depends on sigma.

parameters {
    real gamma;
    real<lower=0> sigma;
}
transformed parameters {
    real beta;
    beta = gamma * sigma;
}
model {
    gamma ~ normal(0, 1);
    sigma ~ exponential(1);
}

plot of chunk sigma_phase

A strongly sparsifying prior

The “horseshoe”:

\[\begin{aligned} \beta_j &\sim \Normal(0, \lambda_j) \\ \lambda_j &\sim \Cauchy(0, \tau) \\ \tau &\sim \Unif(0, 1) \end{aligned}\]

parameters {
    vector[p] d_beta;
    vector[p] d_lambda;
    real<lower=0, upper=1> tau;
}
transformed parameters {
    vector[p] beta;
    beta = d_beta .* d_lambda * tau;
}
model {
    d_beta ~ normal(0, 1);
    d_lambda ~ cauchy(0, 1);
    // tau ~ uniform(0, 1); // uniform
}

The Cauchy as a scale mixture

Recall that if

\[\begin{aligned} \beta &\sim \Normal(0, 1/\sqrt{\lambda}) \\ \lambda &\sim \Gam(1/2, 1/2) \end{aligned}\]

then

\[\begin{aligned} \beta &\sim \Cauchy(0, 1). \end{aligned}\]

With brms

OLS in brms

xy <- cbind(data.frame(y=diabetes$y), diabetes$x2)
names(xy) <- gsub("[:^]", "_", names(xy))

# ols
blm <- brm(y ~ ., data=xy, family=gaussian(link='identity'))

## Compiling the C++ model

## Start sampling

## Warning: There were 4000 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded

## Warning: Examine the pairs() plot to diagnose sampling problems

## Warning: The largest R-hat is 1.32, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess

## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

horseshoe in brms

bhs <- brm(y ~ ., data=xy, family=gaussian(link='identity'), 
           prior=c(set_prior(horseshoe(), class="b")))

## Compiling the C++ model

## Start sampling

(klm <- brms::kfold(blm, K=5))

## Fitting model 1 out of 5

## Fitting model 2 out of 5

## Fitting model 3 out of 5

## Fitting model 4 out of 5

## Fitting model 5 out of 5

## Start sampling
## Start sampling
## Start sampling
## Start sampling
## Start sampling

## 
## Based on 5-fold cross-validation
## 
##            Estimate   SE
## elpd_kfold  -2427.0 14.2
## p_kfold          NA   NA
## kfoldic      4854.1 28.4

(khs <- brms::kfold(bhs, K=5))

## Fitting model 1 out of 5

## Fitting model 2 out of 5

## Fitting model 3 out of 5

## Fitting model 4 out of 5

## Fitting model 5 out of 5

## Start sampling
## Start sampling
## Start sampling
## Start sampling
## Start sampling

## 
## Based on 5-fold cross-validation
## 
##            Estimate   SE
## elpd_kfold  -2399.6 13.6
## p_kfold          NA   NA
## kfoldic      4799.3 27.2

plot of chunk brmresults

plot of chunk brmresults2

Interlude

Estimation of infiltration rate from soil properties using regression model for cultivated land

EIR = 14,195.35 - 141.75 (sand%) - 142.10 (silt%) - 142.56 (clay%)

Use the data to try to reproduce their model:

BIR = 14,195.35 - 141.75 (sand%) - 142.10 (silt%) - 142.56 (clay%)

They’re not wrong! What’s going on?

Using the horseshoe

What’s an appropriate noise distribution?

plot of chunk show_y

Aside: quantile-quantile plots

The idea is to plot the quantiles of each distribution against each other.

If these are datasets, this means just plotting their sorted values against each other.

x <- rnorm(1e4)
y <- rbeta(1e4, 2, 2)
plot(sort(x), sort(y)); qqplot(x, y, main="qqplot"); qqnorm(y, main="qnorm")

plot of chunk qq

Regression with a horseshoe prior

Uses a reparameterization of the Cauchy as a scale mixture of normals.

horseshoe_block <- "
data {
    int N;
    int p;
    vector[N] y;
    matrix[N,p] x;
}
parameters {
    real b0;
    vector[p] d_beta;
    vector[p] d_a;
    vector<lower=0>[p] d_b;
    real<lower=0, upper=1> tau;
    real<lower=0> sigma;
}
transformed parameters {
    vector[p] beta;
    vector[N] f;
    beta = d_beta .* d_a .* sqrt(d_b) * tau;
    f = b0 + x * beta;
}
model {
    y ~ normal(f, sigma);
    // HORSESHOE PRIOR:
    d_beta ~ normal(0, 1);
    d_a ~ normal(0, 1);
    d_b ~ inv_gamma(0.5, 0.5);
    // tau ~ uniform(0, 1); // uniform
    // priors on noise distribution:
    sigma ~ normal(0, 10);
}"

Note the data have already been normalized, with the exception of \(y\):

summary(training_d)

##        y            age              sex              bmi              map               tc              ldl              hdl              tch              ltg              glu             age^2       
##  Min.   : 25   Min.   :-0.107   Min.   :-0.045   Min.   :-0.089   Min.   :-0.109   Min.   :-0.109   Min.   :-0.116   Min.   :-0.102   Min.   :-0.076   Min.   :-0.126   Min.   :-0.138   Min.   :-0.041  
##  1st Qu.: 84   1st Qu.:-0.038   1st Qu.:-0.045   1st Qu.:-0.034   1st Qu.:-0.033   1st Qu.:-0.033   1st Qu.:-0.029   1st Qu.:-0.036   1st Qu.:-0.039   1st Qu.:-0.035   1st Qu.:-0.034   1st Qu.:-0.037  
##  Median :141   Median : 0.005   Median :-0.045   Median :-0.007   Median :-0.006   Median :-0.004   Median :-0.004   Median :-0.007   Median :-0.003   Median :-0.004   Median : 0.003   Median :-0.020  
##  Mean   :152   Mean   :-0.001   Mean   : 0.000   Mean   : 0.001   Mean   :-0.001   Mean   : 0.000   Mean   : 0.001   Mean   :-0.001   Mean   : 0.002   Mean   : 0.000   Mean   : 0.001   Mean   :-0.001  
##  3rd Qu.:214   3rd Qu.: 0.034   3rd Qu.: 0.051   3rd Qu.: 0.034   3rd Qu.: 0.032   3rd Qu.: 0.027   3rd Qu.: 0.031   3rd Qu.: 0.027   3rd Qu.: 0.034   3rd Qu.: 0.034   3rd Qu.: 0.032   3rd Qu.: 0.017  
##  Max.   :341   Max.   : 0.111   Max.   : 0.051   Max.   : 0.171   Max.   : 0.132   Max.   : 0.154   Max.   : 0.199   Max.   : 0.181   Max.   : 0.185   Max.   : 0.134   Max.   : 0.136   Max.   : 0.183  
##      bmi^2           map^2             tc^2            ldl^2           hdl^2           tch^2           ltg^2            glu^2           age:sex          age:bmi          age:map           age:tc      
##  Min.   :-0.03   Min.   :-0.039   Min.   :-0.032   Min.   :-0.03   Min.   :-0.03   Min.   :-0.03   Min.   :-0.035   Min.   :-0.032   Min.   :-0.121   Min.   :-0.147   Min.   :-0.166   Min.   :-0.136  
##  1st Qu.:-0.03   1st Qu.:-0.035   1st Qu.:-0.029   1st Qu.:-0.03   1st Qu.:-0.02   1st Qu.:-0.03   1st Qu.:-0.030   1st Qu.:-0.029   1st Qu.:-0.031   1st Qu.:-0.022   1st Qu.:-0.020   1st Qu.:-0.021  
##  Median :-0.02   Median :-0.020   Median :-0.018   Median :-0.02   Median :-0.01   Median :-0.01   Median :-0.017   Median :-0.017   Median : 0.001   Median :-0.007   Median :-0.009   Median :-0.010  
##  Mean   : 0.00   Mean   :-0.002   Mean   :-0.001   Mean   : 0.00   Mean   : 0.00   Mean   : 0.00   Mean   : 0.001   Mean   : 0.001   Mean   : 0.000   Mean   : 0.000   Mean   : 0.000   Mean   :-0.003  
##  3rd Qu.: 0.02   3rd Qu.: 0.018   3rd Qu.: 0.005   3rd Qu.: 0.01   3rd Qu.: 0.01   3rd Qu.:-0.01   3rd Qu.: 0.017   3rd Qu.: 0.011   3rd Qu.: 0.035   3rd Qu.: 0.019   3rd Qu.: 0.021   3rd Qu.: 0.015  
##  Max.   : 0.39   Max.   : 0.264   Max.   : 0.303   Max.   : 0.49   Max.   : 0.37   Max.   : 0.43   Max.   : 0.241   Max.   : 0.236   Max.   : 0.112   Max.   : 0.170   Max.   : 0.228   Max.   : 0.185  
##     age:ldl          age:hdl          age:tch          age:ltg          age:glu          sex:bmi          sex:map           sex:tc          sex:ldl          sex:hdl          sex:tch          sex:ltg      
##  Min.   :-0.155   Min.   :-0.180   Min.   :-0.201   Min.   :-0.160   Min.   :-0.120   Min.   :-0.126   Min.   :-0.140   Min.   :-0.121   Min.   :-0.130   Min.   :-0.166   Min.   :-0.160   Min.   :-0.143  
##  1st Qu.:-0.021   1st Qu.:-0.025   1st Qu.:-0.018   1st Qu.:-0.022   1st Qu.:-0.021   1st Qu.:-0.036   1st Qu.:-0.033   1st Qu.:-0.032   1st Qu.:-0.032   1st Qu.:-0.034   1st Qu.:-0.020   1st Qu.:-0.034  
##  Median :-0.008   Median : 0.001   Median :-0.009   Median :-0.008   Median :-0.010   Median :-0.002   Median : 0.001   Median :-0.001   Median :-0.001   Median :-0.004   Median : 0.008   Median : 0.004  
##  Mean   :-0.002   Mean   :-0.003   Mean   : 0.000   Mean   : 0.000   Mean   : 0.002   Mean   : 0.000   Mean   :-0.001   Mean   :-0.002   Mean   :-0.001   Mean   :-0.001   Mean   : 0.001   Mean   :-0.001  
##  3rd Qu.: 0.014   3rd Qu.: 0.017   3rd Qu.: 0.021   3rd Qu.: 0.020   3rd Qu.: 0.017   3rd Qu.: 0.034   3rd Qu.: 0.035   3rd Qu.: 0.031   3rd Qu.: 0.029   3rd Qu.: 0.030   3rd Qu.: 0.022   3rd Qu.: 0.032  
##  Max.   : 0.206   Max.   : 0.180   Max.   : 0.284   Max.   : 0.187   Max.   : 0.184   Max.   : 0.179   Max.   : 0.104   Max.   : 0.162   Max.   : 0.206   Max.   : 0.162   Max.   : 0.191   Max.   : 0.136  
##     sex:glu          bmi:map           bmi:tc          bmi:ldl          bmi:hdl          bmi:tch          bmi:ltg          bmi:glu           map:tc          map:ldl          map:hdl          map:tch      
##  Min.   :-0.140   Min.   :-0.125   Min.   :-0.293   Min.   :-0.287   Min.   :-0.268   Min.   :-0.132   Min.   :-0.185   Min.   :-0.154   Min.   :-0.202   Min.   :-0.204   Min.   :-0.283   Min.   :-0.145  
##  1st Qu.:-0.033   1st Qu.:-0.021   1st Qu.:-0.019   1st Qu.:-0.022   1st Qu.:-0.019   1st Qu.:-0.024   1st Qu.:-0.024   1st Qu.:-0.024   1st Qu.:-0.020   1st Qu.:-0.020   1st Qu.:-0.020   1st Qu.:-0.018  
##  Median :-0.005   Median :-0.011   Median :-0.008   Median :-0.008   Median : 0.011   Median :-0.016   Median :-0.012   Median :-0.015   Median :-0.009   Median :-0.007   Median : 0.005   Median :-0.010  
##  Mean   :-0.002   Mean   : 0.002   Mean   : 0.000   Mean   : 0.001   Mean   : 0.000   Mean   : 0.001   Mean   : 0.000   Mean   : 0.000   Mean   :-0.002   Mean   :-0.001   Mean   :-0.001   Mean   : 0.000  
##  3rd Qu.: 0.034   3rd Qu.: 0.017   3rd Qu.: 0.020   3rd Qu.: 0.016   3rd Qu.: 0.026   3rd Qu.: 0.021   3rd Qu.: 0.022   3rd Qu.: 0.019   3rd Qu.: 0.018   3rd Qu.: 0.018   3rd Qu.: 0.020   3rd Qu.: 0.021  
##  Max.   : 0.133   Max.   : 0.228   Max.   : 0.227   Max.   : 0.232   Max.   : 0.146   Max.   : 0.276   Max.   : 0.219   Max.   : 0.225   Max.   : 0.190   Max.   : 0.192   Max.   : 0.141   Max.   : 0.229  
##     map:ltg          map:glu           tc:ldl          tc:hdl          tc:tch          tc:ltg           tc:glu          ldl:hdl          ldl:tch         ldl:ltg          ldl:glu          hdl:tch      
##  Min.   :-0.146   Min.   :-0.143   Min.   :-0.04   Min.   :-0.21   Min.   :-0.12   Min.   :-0.105   Min.   :-0.121   Min.   :-0.256   Min.   :-0.11   Min.   :-0.183   Min.   :-0.123   Min.   :-0.235  
##  1st Qu.:-0.024   1st Qu.:-0.023   1st Qu.:-0.03   1st Qu.:-0.02   1st Qu.:-0.02   1st Qu.:-0.026   1st Qu.:-0.021   1st Qu.:-0.017   1st Qu.:-0.02   1st Qu.:-0.021   1st Qu.:-0.020   1st Qu.:-0.019  
##  Median :-0.012   Median :-0.013   Median :-0.02   Median : 0.00   Median :-0.02   Median :-0.015   Median :-0.011   Median : 0.005   Median :-0.01   Median :-0.007   Median :-0.010   Median : 0.014  
##  Mean   :-0.001   Mean   :-0.001   Mean   : 0.00   Mean   : 0.00   Mean   : 0.00   Mean   : 0.000   Mean   : 0.002   Mean   :-0.002   Mean   : 0.00   Mean   : 0.001   Mean   : 0.002   Mean   :-0.002  
##  3rd Qu.: 0.024   3rd Qu.: 0.011   3rd Qu.: 0.01   3rd Qu.: 0.01   3rd Qu.: 0.01   3rd Qu.: 0.017   3rd Qu.: 0.014   3rd Qu.: 0.017   3rd Qu.: 0.01   3rd Qu.: 0.025   3rd Qu.: 0.015   3rd Qu.: 0.031  
##  Max.   : 0.187   Max.   : 0.222   Max.   : 0.40   Max.   : 0.32   Max.   : 0.47   Max.   : 0.223   Max.   : 0.237   Max.   : 0.161   Max.   : 0.56   Max.   : 0.203   Max.   : 0.299   Max.   : 0.061  
##     hdl:ltg          hdl:glu          tch:ltg         tch:glu         ltg:glu      
##  Min.   :-0.255   Min.   :-0.223   Min.   :-0.16   Min.   :-0.13   Min.   :-0.092  
##  1st Qu.:-0.019   1st Qu.:-0.015   1st Qu.:-0.03   1st Qu.:-0.02   1st Qu.:-0.023  
##  Median : 0.007   Median : 0.009   Median :-0.01   Median :-0.02   Median :-0.013  
##  Mean   :-0.002   Mean   : 0.001   Mean   : 0.00   Mean   : 0.00   Mean   : 0.000  
##  3rd Qu.: 0.021   3rd Qu.: 0.023   3rd Qu.: 0.02   3rd Qu.: 0.02   3rd Qu.: 0.014  
##  Max.   : 0.163   Max.   : 0.210   Max.   : 0.38   Max.   : 0.32   Max.   : 0.195

horseshoe_fit <- stan(model_code=horseshoe_block,
                      data=list(N=nrow(training_d),
                                p=ncol(training_d)-1,
                                y=(training_d$y 
                                   - median(training_d$y))
                                  /mad(training_d$y),
                                x=as.matrix(training_d[,-1])),
                      iter=1000,
                      control=list(adapt_delta=0.999,
                                   max_treedepth=15))

## Warning: There were 27 divergent transitions after warmup. Increasing adapt_delta above 0.999 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

## Warning: Examine the pairs() plot to diagnose sampling problems

## Warning: The largest R-hat is 1.53, indicating chains have not mixed.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#r-hat

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess

## Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#tail-ess

(hs_summary <- rstan::summary(horseshoe_fit, pars=c("b0", "sigma", "beta"))$summary)

##              mean se_mean    sd   2.5%      25%      50%    75%  98% n_eff Rhat
## b0        0.11847 0.00054 0.031  0.058  0.09724  0.11828  0.141 0.18  3347    1
## sigma     0.59494 0.00048 0.024  0.549  0.57915  0.59438  0.610 0.64  2477    1
## beta[1]   0.18468 0.01081 0.441 -0.439 -0.04236  0.05639  0.330 1.36  1669    1
## beta[2]  -1.27036 0.03230 1.000 -3.355 -1.98366 -1.22500 -0.383 0.12   959    1
## beta[3]   5.97824 0.02349 0.928  4.142  5.36459  5.99472  6.606 7.83  1561    1
## beta[4]   3.43216 0.03138 0.990  1.345  2.83589  3.46677  4.079 5.29   996    1
## beta[5]  -0.23122 0.02070 0.669 -1.934 -0.34336 -0.04547  0.047 0.65  1045    1
## beta[6]  -0.00017 0.01544 0.560 -1.000 -0.14021 -0.00303  0.109 0.96  1317    1
## beta[7]  -1.31586 0.03931 1.197 -3.845 -2.21610 -1.14696 -0.159 0.18   928    1
## beta[8]   0.30767 0.02202 0.755 -0.590 -0.03962  0.05539  0.410 2.67  1177    1
## beta[9]   5.71389 0.02644 1.018  3.725  5.03216  5.68509  6.420 7.72  1484    1
## beta[10]  0.26253 0.01448 0.518 -0.414 -0.02317  0.07275  0.423 1.73  1278    1
## beta[11]  0.23320 0.01261 0.482 -0.416 -0.02111  0.07704  0.393 1.58  1463    1
## beta[12]  0.14319 0.01199 0.442 -0.587 -0.04693  0.03456  0.267 1.36  1361    1
## beta[13]  0.03119 0.01080 0.339 -0.672 -0.08882  0.00737  0.129 0.85   987    1
## beta[14] -0.02838 0.01354 0.486 -1.202 -0.16016 -0.00246  0.121 1.02  1288    1
## beta[15] -0.08398 0.01429 0.535 -1.337 -0.17772 -0.01292  0.084 0.91  1400    1
## beta[16]  0.03915 0.01208 0.441 -0.915 -0.09980  0.00799  0.166 1.05  1333    1
## beta[17]  0.07360 0.01198 0.432 -0.698 -0.08954  0.01054  0.178 1.18  1302    1
## beta[18] -0.28898 0.01467 0.545 -1.751 -0.46633 -0.09520  0.014 0.39  1382    1
## beta[19]  0.78015 0.02871 0.871 -0.278  0.04895  0.51543  1.368 2.74   920    1
## beta[20]  1.47129 0.02600 0.938 -0.032  0.72236  1.48715  2.159 3.26  1301    1
## beta[21]  0.20772 0.01309 0.481 -0.499 -0.03633  0.06031  0.352 1.52  1351    1
## beta[22]  0.26289 0.01393 0.512 -0.417 -0.01699  0.09774  0.455 1.67  1349    1
## beta[23]  0.00158 0.01182 0.417 -0.879 -0.11609  0.00167  0.134 0.84  1245    1
## beta[24] -0.10350 0.01260 0.462 -1.316 -0.21664 -0.01530  0.069 0.68  1344    1
## beta[25]  0.03007 0.01111 0.385 -0.725 -0.10381  0.00717  0.153 0.92  1199    1
## beta[26]  0.04793 0.01247 0.434 -0.792 -0.10275  0.00745  0.172 1.06  1213    1
## beta[27]  0.35088 0.01561 0.605 -0.384 -0.00555  0.11684  0.572 1.98  1502    1
## beta[28]  0.28770 0.01490 0.555 -0.429 -0.01956  0.09428  0.492 1.84  1388    1
## beta[29]  0.55050 0.01892 0.715 -0.249  0.00938  0.29048  0.930 2.34  1426    1
## beta[30]  0.16897 0.01170 0.444 -0.483 -0.03571  0.04440  0.289 1.40  1440    1
## beta[31]  0.07436 0.01259 0.429 -0.668 -0.09090  0.01118  0.166 1.21  1162    1
## beta[32] -0.13863 0.01324 0.446 -1.328 -0.25271 -0.03182  0.059 0.59  1134    1
## beta[33]  0.41573 0.01688 0.645 -0.315 -0.00077  0.16688  0.666 2.14  1461    1
## beta[34] -0.13366 0.01228 0.452 -1.323 -0.27521 -0.03291  0.057 0.59  1354    1
## beta[35]  0.06844 0.01060 0.375 -0.656 -0.07767  0.00983  0.189 0.99  1251    1
## beta[36]  0.00597 0.00986 0.351 -0.773 -0.11192  0.00097  0.135 0.76  1267    1
## beta[37]  0.84159 0.02197 0.878 -0.203  0.08052  0.59820  1.447 2.82  1598    1
## beta[38]  0.00356 0.01001 0.393 -0.814 -0.11802  0.00027  0.123 0.87  1539    1
## beta[39]  0.03318 0.01143 0.408 -0.838 -0.09997  0.00638  0.154 0.97  1272    1
## beta[40] -0.03956 0.01084 0.406 -0.994 -0.16742 -0.00808  0.101 0.80  1402    1
## beta[41]  0.04430 0.01212 0.428 -0.801 -0.11201  0.00518  0.164 1.06  1248    1
## beta[42]  0.03220 0.01094 0.357 -0.746 -0.09465  0.00515  0.152 0.86  1068    1
## beta[43]  0.32835 0.01665 0.595 -0.426 -0.01796  0.12077  0.522 1.98  1278    1
## beta[44]  0.08403 0.01295 0.431 -0.689 -0.08137  0.01713  0.202 1.13  1109    1
## beta[45]  0.08795 0.01288 0.438 -0.625 -0.07965  0.01571  0.176 1.20  1155    1
## beta[46]  0.20045 0.01362 0.523 -0.593 -0.03954  0.04958  0.323 1.64  1474    1
## beta[47] -0.12439 0.01312 0.468 -1.346 -0.23290 -0.02234  0.066 0.66  1275    1
## beta[48]  0.04445 0.01014 0.390 -0.703 -0.10447  0.00717  0.151 1.00  1479    1
## beta[49] -0.17629 0.01495 0.499 -1.577 -0.28276 -0.03913  0.057 0.52  1112    1
## beta[50] -0.01323 0.01286 0.501 -1.080 -0.15172 -0.00465  0.112 0.98  1518    1
## beta[51]  0.07131 0.01251 0.448 -0.667 -0.09582  0.01067  0.167 1.19  1282    1
## beta[52] -0.37967 0.02132 0.758 -2.461 -0.54911 -0.09195  0.024 0.46  1263    1
## beta[53] -0.13073 0.01312 0.498 -1.500 -0.22152 -0.01822  0.072 0.63  1438    1
## beta[54]  0.05582 0.01172 0.435 -0.805 -0.08886  0.01069  0.173 1.10  1377    1
## beta[55] -0.07641 0.01428 0.466 -1.315 -0.17775 -0.00895  0.094 0.71  1063    1
## beta[56] -0.03597 0.01354 0.468 -1.096 -0.16718 -0.00506  0.109 0.96  1196    1
## beta[57]  0.23560 0.01706 0.606 -0.483 -0.04384  0.04846  0.352 2.00  1262    1
## beta[58]  0.02036 0.01282 0.432 -0.871 -0.12112  0.00130  0.143 0.96  1136    1
## beta[59] -0.15576 0.01334 0.488 -1.542 -0.25669 -0.02821  0.057 0.61  1340    1
## beta[60]  0.10674 0.01163 0.417 -0.574 -0.06224  0.02681  0.212 1.20  1284    1
## beta[61]  0.01456 0.01035 0.364 -0.768 -0.11944  0.00029  0.129 0.86  1233    1
## beta[62] -0.14660 0.01325 0.492 -1.361 -0.26495 -0.02706  0.060 0.70  1379    1
## beta[63]  0.12335 0.01578 0.496 -0.704 -0.07143  0.02268  0.233 1.49   989    1
## beta[64]  0.15840 0.01482 0.490 -0.603 -0.05885  0.03289  0.306 1.50  1092    1

First compare the resulting regression parameters to OLS values.

plot of chunk compare_betas

The coefficient estimates from OLS are wierd.

##                  ols     stan    brms
## (Intercept)   150.69  151.697  152.12
## tc          -5344.14   -4.112  -39.32
## `tc:ldl`    -4837.76   -0.420    2.57
## ldl          4723.00   -0.274  -15.40
## `tc^2`       4501.31   -0.222    0.41
## ltg          2350.56  514.151  503.02
## `tc:hdl`    -2183.97    0.965   15.63
## `tc:ltg`    -2127.48   -1.647   -9.43
## `tc:tch`    -2109.49   -8.316  -37.94
## hdl          1680.54 -103.729 -198.07
## `hdl:tch`    1423.43   -2.551  -15.64
## `ldl^2`      1315.35   -1.168   -8.08
## `ldl:ltg`    1301.03    4.383   31.38
## `tch^2`      1153.51    0.953    9.84
## `ltg^2`      1092.08   -8.610  -16.80
## `hdl^2`      1030.42    0.723    4.75
## `ldl:glu`    -997.18    0.118   14.30
## `tc:glu`      950.94    0.967    7.18
## `ldl:hdl`     750.67   -0.809   -6.66
## `sex:tc`      710.47    1.011    1.77
## `ldl:tch`     685.72   -0.458    2.79
## `sex:ldl`    -583.09   -2.878  -14.72
## `hdl:ltg`     579.69    2.424   16.42
## bmi           472.78  542.154  510.40
## `bmi:tc`     -449.81    0.024   -6.41
## `bmi:ldl`     449.71    0.577   -2.40
## `age:tc`     -381.44    0.151   -4.40
## map           360.87  313.530  290.61
## sex          -269.63 -110.788 -160.69
## `ltg:glu`    -262.97    2.975    7.05
## `map:glu`    -244.48   -3.539  -11.83
## `bmi:map`     232.78   54.101   77.03
## `tch:ltg`     231.58   -2.447  -24.62
## `age:ltg`     226.96   10.567   32.23
## `age:ldl`     210.22   -1.384  -20.88
## `sex:ltg`    -210.10    0.889    2.05
## `hdl:glu`    -207.19    0.027    3.76
## `age:hdl`     200.91    0.648   12.25
## `tch:glu`     195.25    2.051   22.71
## `map:tc`      164.89    1.549   11.64
## `sex:bmi`     151.25   26.270   23.66
## `age:sex`     148.20  134.496  120.77
## `bmi:tch`    -132.98    0.469    6.28
## `bmi:ltg`     132.11    0.466   10.37
## `glu^2`       128.34   46.615   49.61
## `age:glu`     123.57    8.527   25.26
## `bmi:hdl`     123.46   -0.731   -2.65
## `map:tch`    -114.15   -2.020   -9.89
## glu            89.65    6.579   26.70
## `sex:hdl`     -89.04   15.092   24.71
## `bmi:glu`      88.77   10.922   16.41
## tch           -85.18    5.010   34.45
## age            84.95    5.100    7.73
## `map:hdl`     -84.93    4.484   16.25
## `age^2`        66.33    6.968   28.84
## `sex:tch`     -61.99   -2.976  -15.37
## `age:tch`      61.39    0.673    0.95
## `map^2`       -52.39    0.667    4.76
## `map:ldl`     -35.37    1.421    7.81
## `sex:map`      34.90    4.016   28.94
## `age:map`      20.59    8.839   22.15
## `bmi^2`       -15.81    3.125   29.56
## `map:ltg`       3.84    0.648    7.32
## `sex:glu`       2.21    0.088    8.23
## `age:bmi`       0.26    5.454    3.13

And, quite different than what Stan gets.

##                  ols     stan    brms
## (Intercept)   150.69  151.697  152.12
## bmi           472.78  542.154  510.40
## ltg          2350.56  514.151  503.02
## map           360.87  313.530  290.61
## `age:sex`     148.20  134.496  120.77
## sex          -269.63 -110.788 -160.69
## hdl          1680.54 -103.729 -198.07
## `bmi:map`     232.78   54.101   77.03
## `glu^2`       128.34   46.615   49.61
## `sex:bmi`     151.25   26.270   23.66
## `sex:hdl`     -89.04   15.092   24.71
## `bmi:glu`      88.77   10.922   16.41
## `age:ltg`     226.96   10.567   32.23
## `age:map`      20.59    8.839   22.15
## `ltg^2`      1092.08   -8.610  -16.80
## `age:glu`     123.57    8.527   25.26
## `tc:tch`    -2109.49   -8.316  -37.94
## `age^2`        66.33    6.968   28.84
## glu            89.65    6.579   26.70
## `age:bmi`       0.26    5.454    3.13
## age            84.95    5.100    7.73
## tch           -85.18    5.010   34.45
## `map:hdl`     -84.93    4.484   16.25
## `ldl:ltg`    1301.03    4.383   31.38
## tc          -5344.14   -4.112  -39.32
## `sex:map`      34.90    4.016   28.94
## `map:glu`    -244.48   -3.539  -11.83
## `bmi^2`       -15.81    3.125   29.56
## `sex:tch`     -61.99   -2.976  -15.37
## `ltg:glu`    -262.97    2.975    7.05
## `sex:ldl`    -583.09   -2.878  -14.72
## `hdl:tch`    1423.43   -2.551  -15.64
## `tch:ltg`     231.58   -2.447  -24.62
## `hdl:ltg`     579.69    2.424   16.42
## `tch:glu`     195.25    2.051   22.71
## `map:tch`    -114.15   -2.020   -9.89
## `tc:ltg`    -2127.48   -1.647   -9.43
## `map:tc`      164.89    1.549   11.64
## `map:ldl`     -35.37    1.421    7.81
## `age:ldl`     210.22   -1.384  -20.88
## `ldl^2`      1315.35   -1.168   -8.08
## `sex:tc`      710.47    1.011    1.77
## `tc:glu`      950.94    0.967    7.18
## `tc:hdl`    -2183.97    0.965   15.63
## `tch^2`      1153.51    0.953    9.84
## `sex:ltg`    -210.10    0.889    2.05
## `ldl:hdl`     750.67   -0.809   -6.66
## `bmi:hdl`     123.46   -0.731   -2.65
## `hdl^2`      1030.42    0.723    4.75
## `age:tch`      61.39    0.673    0.95
## `map^2`       -52.39    0.667    4.76
## `age:hdl`     200.91    0.648   12.25
## `map:ltg`       3.84    0.648    7.32
## `bmi:ldl`     449.71    0.577   -2.40
## `bmi:tch`    -132.98    0.469    6.28
## `bmi:ltg`     132.11    0.466   10.37
## `ldl:tch`     685.72   -0.458    2.79
## `tc:ldl`    -4837.76   -0.420    2.57
## ldl          4723.00   -0.274  -15.40
## `tc^2`       4501.31   -0.222    0.41
## `age:tc`     -381.44    0.151   -4.40
## `ldl:glu`    -997.18    0.118   14.30
## `sex:glu`       2.21    0.088    8.23
## `hdl:glu`    -207.19    0.027    3.76
## `bmi:tc`     -449.81    0.024   -6.41

Now let’s look at out-of-sample prediction error, using the posterior median coefficient estimates:

pred_stan <- function (x) {
    post_med_intercept + as.matrix(x) %*% post_med_slopes
}
pred_y <- pred_stan(test_d[,-1])
stan_pred_error <- sqrt(mean((test_d$y - pred_y)^2))
stan_mse_resid <- sqrt(mean((training_d$y - pred_stan(training_d[,-1]))^2))

plot(test_d$y, pred_y, xlab="true values", ylab="predicted values", main="test data")
abline(0,1)

plot of chunk pred_stan

Conclusions?

Our “sparse” model is certainly more sparse, and arguably more interpretable.
It has a root-mean-square prediction error of 52.42 on the test data, and 52.92 on the training data.
This is substantially better than ordinary linear regression, which had a root-mean-square prediction error of 61.35 on the test data, and a root-mean-square-error of 47.91 on the training data.

The sparse model is more interpretable, and more generalizable.

Midterm review

Steps in data analysis

Care, or at least think, about the data.
Look at the data.
Query the data.
Sanity check.
Communicate.

Often “statistics” focuses on querying. Doing that effectively requires all the other steps, too.

Statistics or parameters?

A statistic is: a numerical description of a dataset.

A parameter is: a numerical attribute of a model of reality.

Often, statistics are used to estimate parameters.

The two+ heads of classical statistics

estimating parameters, with uncertainty (confidence or credible intervals)

evaluating (in-)consistency with a particular situation (\(p\)-values)

prediction, with uncertainty

model adequacy and fit (simulation, crossvalidation)

Models, part 1: How do we do this?

Parameters are numbers. With units.

So, they need to mean something concrete and precise.

In other words, they need to be part of a model.

Models, part 2: Noise, and uncertainty.

To separate information from noise, we need to understand the noise.

How much of these differences are due to measurement error? or, unpredictable variation? How much is telling me something about the world?

A full model also describes the statistical noise in the data.

(It’s a full model if you can simulate from it.)

Models, part 3: how to fit them.

Fitting a model: Finding parameter values that make the data look plausible.

There may be a wide range of plausible parameter values.

Priors help get answers.

Aspects of a model

What variables?
Response distribution(s).
Relationship of predictors to the response(s).
Relationships between parameters.

Every statistical method has a model lurking somewhere behind it.

How do we fit models?

The basic tool is likelihood:

how likely are the data, using this set of parameters?

Some methods:

Hill climbing (maximum likelihood).
Hill wandering (Monte Carlo to explore posterior distributions)