Peter Ralph
Advanced Biological Statistics
The idea: choose a (parametric) distribution for the time-until-event.
Some choices:
Model:
Model:
\[\begin{aligned} T_i &\sim \Weibull(\text{shape}=k, \\ &\qquad \text{mean}=\mu_i) \\ \mu_i &= \exp(y_i) \\ y_i &= \beta_0 + \sum_j \beta_j x_{ij} \end{aligned}\]
Three ways to do parametric survival analysis:
survival::survreg
: maximum likelihoodbrms::brm
: Baysian, with Stanlung$censored <- ifelse(lung$status == 1, 'right', 'none')
brmfit <- brm(time | cens(censored) ~ sex + ph.ecog + ph.karno,
data=lung, family='weibull')
## Compiling Stan program...
## Start sampling
Call: survreg(formula = Surv(time, status) ~ sex + ph.ecog + ph.karno, data = lung, dist = "weibull") Value Std. Error z p (Intercept) 6.66434 0.65785 10.13 < 2e-16 sex 0.41141 0.12287 3.35 0.00081 ph.ecog -0.47729 0.12618 -3.78 0.00016 ph.karno -0.00906 0.00675 -1.34 0.17983 Log(scale) -0.32452 0.06207 -5.23 1.7e-07 Scale= 0.723 Weibull distribution Loglik(model)= -1126.3 Loglik(intercept only)= -1141.1 Chisq= 29.49 on 3 degrees of freedom, p= 1.8e-06 Number of Newton-Raphson Iterations: 5 n=226 (2 observations deleted due to missingness)
Family: weibull Links: mu = log; shape = identity Formula: time | cens(censored) ~ sex + ph.ecog + ph.karno Data: lung (Number of observations: 226) Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup draws = 4000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 6.61 0.65 5.34 7.89 1.00 2792 2974 sex 0.42 0.13 0.17 0.67 1.00 3328 2444 ph.ecog -0.49 0.13 -0.73 -0.24 1.00 2675 2826 ph.karno -0.01 0.01 -0.02 0.00 1.00 2699 2859 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS shape 1.36 0.09 1.20 1.53 1.00 3453 2961 Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
Recall there are 19 different institutions:
##
## 1 2 3 4 5 6 7 10 11 12 13 15 16 21 22 26 32 33
## 36 5 19 4 9 14 8 4 18 23 20 6 16 13 17 6 7 2
brmfit2 <- brm(time | cens(censored) ~ sex + ph.ecog + ph.karno + (1|inst),
data=subset(lung, !is.na(inst)), family='weibull')
## Compiling Stan program...
## Start sampling
## Family: weibull
## Links: mu = log; shape = identity
## Formula: time | cens(censored) ~ sex + ph.ecog + ph.karno + (1 | inst)
## Data: subset(lung, !is.na(inst)) (Number of observations: 225)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~inst (Number of levels: 18)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.12 0.08 0.01 0.32 1.00 1409 1540
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 6.62 0.68 5.26 7.97 1.00 2918 2918
## sex 0.42 0.13 0.18 0.68 1.00 4429 3074
## ph.ecog -0.51 0.13 -0.78 -0.26 1.00 2800 2653
## ph.karno -0.01 0.01 -0.02 0.00 1.00 2962 3149
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 1.36 0.09 1.20 1.53 1.00 4190 3144
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The conditional effect of sex on mean survival time (with mean values of other variables):
The conditional effect of ECOG on mean survival time:
The conditional effect of Karnofsky score on mean survival time:
A posterior predictive check: data are dark points, posterior medians are large points, and lines give 50% and 90% posterior ranges:
Consulting vignette(“brms_families”), the survival curve is \(\exp(-(t / \lambda)^k)\) where \(\lambda = \mu / \Gamma(1 + 1/k)\). Here are 100 samples from the posterior:
Here’s the code for that last one:
post_means <- posterior_epred(brmfit2, newdata=data.frame(age=60, sex=2, ph.ecog=0:3, ph.karno=50), re_formula=NA, dpar="mu")[1:100,]
post_shape <- posterior_epred(brmfit2, newdata=data.frame(age=60, sex=2, ph.ecog=0:3, ph.karno=50), re_formula=NA, dpar="shape")[1:100,]
tvals <- seq(0, 2000, length.out=101)
Svals <- array(NA, dim=c(length(tvals), nrow(post_means), ncol(post_means)))
for (k in 1:nrow(post_means)) {
lambda <- post_means[k,] / gamma(1 + 1/post_shape[k,])
for (j in 1:ncol(post_means)) {
Svals[,k,j] <- exp(- (tvals / lambda[j])^post_shape[k,j])
}
}
layout(matrix(1:4, nrow=2, byrow=TRUE))
for (j in 1:dim(Svals)[3]) {
plot(tvals, rowMeans(Svals[,,j]), lty=1, ylim=c(0,1), type='l', xlab='time (t)', ylab='S(t)', col=j, main=paste("ECOG =", j-1))
matlines(tvals, Svals[,,j], type='l', col=adjustcolor(j, 0.25), lty=1)
abline(v=1000, lty=3)
}
Fit (and select) a proportional hazards model to the veteran
dataset. What are the most important predictors of survival?
Veterans' Administration Lung Cancer study
Description:
Randomised trial of two treatment regimens for lung cancer. This
is a standard survival analysis data set.
Usage:
veteran
Format:
trt: 1=standard 2=test
celltype: 1=squamous, 2=smallcell, 3=adeno, 4=large
time: survival time
status: censoring status
karno: Karnofsky performance score (100=good)
diagtime: months from diagnosis to randomisation
age: in years
prior: prior therapy 0=no, 10=yes
Nonparametric:
Parametric:
f1 <- survfit(Surv(time, status) ~ 1, data = lung)
y <- data.frame(time=sort(unique(lung$time[lung$status == 2])))
y$k <- sapply(y$time, function (t) sum(lung$status == 2 & lung$time == t))
y$n <- sapply(y$time, function (t) sum(lung$time > t | (lung$time == t & lung$status == 2)))
y$p <- cumprod(1 - y$k/y$n)
plot(f1)
points(p ~ time, data=y, col='red', pch=20, cex=0.5)