Peter Ralph
26 January 2021 – Advanced Biological Statistics
Compare the Normal
\[\begin{aligned} X \sim \Normal(0,1) \end{aligned}\]
to the “exponential scale mixture of Normals”,
\[\begin{aligned} X &\sim \Normal(0,\sigma) \\ \sigma &\sim \Exp(1) . \end{aligned}\]
Note that
\[\begin{aligned} \beta &\sim \Normal(0,\sigma) \\ \sigma &\sim \Exp(1) . \end{aligned}\]
is equivalent to
\[\begin{aligned} \beta &= \sigma \gamma \\ \gamma &\sim \Normal(0,1) \\ \sigma &\sim \Exp(1) . \end{aligned}\]
parameters {
real beta;
real<lower=0> sigma;
}
model {
beta ~ normal(0, sigma);
sigma ~ exponential(1);
}
is equivalent to
parameters {
real gamma;
real<lower=0> sigma;
}
transformed parameters {
real beta;
beta = gamma * sigma;
}
model {
gamma ~ normal(0, 1);
sigma ~ exponential(1);
}
The second version is better for Stan.
Why is it better?
parameters {
real beta;
real<lower=0> sigma;
}
model {
beta ~ normal(0, sigma);
sigma ~ exponential(1);
}
In the first, the optimal step size depends on sigma
.
parameters {
real gamma;
real<lower=0> sigma;
}
transformed parameters {
real beta;
beta = gamma * sigma;
}
model {
gamma ~ normal(0, 1);
sigma ~ exponential(1);
}
The “horseshoe”:
\[\begin{aligned} \beta_j &\sim \Normal(0, \lambda_j) \\ \lambda_j &\sim \Cauchy(0, \tau) \\ \tau &\sim \Unif(0, 1) \end{aligned}\]
parameters {
vector[p] d_beta;
vector[p] d_lambda;
real<lower=0, upper=1> tau;
}
transformed parameters {
vector[p] beta;
beta = d_beta .* d_lambda * tau;
}
model {
d_beta ~ normal(0, 1);
d_lambda ~ cauchy(0, 1);
// tau ~ uniform(0, 1); // uniform
}
Recall that if
\[\begin{aligned} \beta &\sim \Normal(0, 1/\sqrt{\lambda}) \\ \lambda &\sim \Gam(1/2, 1/2) \end{aligned}\]
then
\[\begin{aligned} \beta &\sim \Cauchy(0, 1). \end{aligned}\]
Uses a reparameterization of the Cauchy as a scale mixture of normals.
horseshoe_block <- "
data {
int N;
int p;
vector[N] y;
matrix[N,p] x;
}
parameters {
real b0;
vector[p] d_beta;
vector[p] d_a;
vector<lower=0>[p] d_b;
real<lower=0, upper=1> tau;
real<lower=0> sigma;
}
transformed parameters {
vector[p] beta;
vector[N] f;
beta = d_beta .* d_a .* sqrt(d_b) * tau;
f = b0 + x * beta;
}
model {
y ~ normal(f, sigma);
// HORSESHOE PRIOR:
d_beta ~ normal(0, 1);
d_a ~ normal(0, 1);
d_b ~ inv_gamma(0.5, 0.5);
// tau ~ uniform(0, 1); // uniform
// priors on noise distribution:
sigma ~ normal(0, 10);
}"