\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\SE}{SE} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\cor}{cor} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Reparameterization in Stan

Peter Ralph

26 January 2021 – Advanced Biological Statistics

Sparseness and scale mixtures

Compare the Normal

\[\begin{aligned} X \sim \Normal(0,1) \end{aligned}\]

to the “exponential scale mixture of Normals”,

\[\begin{aligned} X &\sim \Normal(0,\sigma) \\ \sigma &\sim \Exp(1) . \end{aligned}\]

plot of chunk r scale_mixturesplot of chunk r scale_mixtures

Implementation

Note that

\[\begin{aligned} \beta &\sim \Normal(0,\sigma) \\ \sigma &\sim \Exp(1) . \end{aligned}\]

is equivalent to

\[\begin{aligned} \beta &= \sigma \gamma \\ \gamma &\sim \Normal(0,1) \\ \sigma &\sim \Exp(1) . \end{aligned}\]

parameters {
    real beta;
    real<lower=0> sigma;
}
model {
    beta ~ normal(0, sigma);
    sigma ~ exponential(1);
}

is equivalent to

parameters {
    real gamma;
    real<lower=0> sigma;
}
transformed parameters {
    real beta;
    beta = gamma * sigma;
}
model {
    gamma ~ normal(0, 1);
    sigma ~ exponential(1);
}

The second version is better for Stan.

Why is it better?

parameters {
    real beta;
    real<lower=0> sigma;
}
model {
    beta ~ normal(0, sigma);
    sigma ~ exponential(1);
}

In the first, the optimal step size depends on sigma.

parameters {
    real gamma;
    real<lower=0> sigma;
}
transformed parameters {
    real beta;
    beta = gamma * sigma;
}
model {
    gamma ~ normal(0, 1);
    sigma ~ exponential(1);
}

plot of chunk r sigma_phaseplot of chunk r sigma_phase

A strongly sparsifying prior

The “horseshoe”:

\[\begin{aligned} \beta_j &\sim \Normal(0, \lambda_j) \\ \lambda_j &\sim \Cauchy(0, \tau) \\ \tau &\sim \Unif(0, 1) \end{aligned}\]

parameters {
    vector[p] d_beta;
    vector[p] d_lambda;
    real<lower=0, upper=1> tau;
}
transformed parameters {
    vector[p] beta;
    beta = d_beta .* d_lambda * tau;
}
model {
    d_beta ~ normal(0, 1);
    d_lambda ~ cauchy(0, 1);
    // tau ~ uniform(0, 1); // uniform
}

The Cauchy as a scale mixture

Recall that if

\[\begin{aligned} \beta &\sim \Normal(0, 1/\sqrt{\lambda}) \\ \lambda &\sim \Gam(1/2, 1/2) \end{aligned}\]

then

\[\begin{aligned} \beta &\sim \Cauchy(0, 1). \end{aligned}\]

Using the horseshoe

Regression with a horseshoe prior

Uses a reparameterization of the Cauchy as a scale mixture of normals.

horseshoe_block <- "
data {
    int N;
    int p;
    vector[N] y;
    matrix[N,p] x;
}
parameters {
    real b0;
    vector[p] d_beta;
    vector[p] d_a;
    vector<lower=0>[p] d_b;
    real<lower=0, upper=1> tau;
    real<lower=0> sigma;
}
transformed parameters {
    vector[p] beta;
    vector[N] f;
    beta = d_beta .* d_a .* sqrt(d_b) * tau;
    f = b0 + x * beta;
}
model {
    y ~ normal(f, sigma);
    // HORSESHOE PRIOR:
    d_beta ~ normal(0, 1);
    d_a ~ normal(0, 1);
    d_b ~ inv_gamma(0.5, 0.5);
    // tau ~ uniform(0, 1); // uniform
    // priors on noise distribution:
    sigma ~ normal(0, 10);
}"
// reveal.js plugins