Peter Ralph
5 January 2021 – Advanced Biological Statistics
Goal: understand how covariates affect survival time.
Idea: modify a nonparametric hazard rate by a linear predictor: \[ h(t) = h_0(t) e^{\beta x} .\]
How does survival probability depend on the covariates?
age: Age in years
sex: Male=1 Female=2
ph.ecog: ECOG performance score as rated by the physician.
(good=0-bad=4)
ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician
pat.karno: Karnofsky performance score as rated by patient
meal.cal: Calories consumed at meals
wt.loss: Weight loss in last six months
How does survival probability depend on the covariates?
age: Age in years
sex: Male=1 Female=2
ph.ecog: ECOG performance score as rated by the physician.
(good=0-bad=4)
ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician
pat.karno: Karnofsky performance score as rated by patient
meal.cal: Calories consumed at meals
wt.loss: Weight loss in last six months
\[\begin{aligned} h(t) &= \text{(baseline hazard rate at $t$)} \\ y_i &= \beta_0 + \sum_j \beta_j x_{ij} \\ h_i(t) &= \text{(hazard rate for $i$ at $t$)} \\ &= h(t) \times \exp\left( y_i \right) \\ S_i(t) &= \P\left\{\text{survival of $i$ until $t$}\right\} \\ &= \exp\left( - \int_0^t h_i(s) ds \right) . \end{aligned}\]
fullcox <- coxph(Surv(time, status)
~ age + sex + ph.ecog + ph.karno + pat.karno
+ meal.cal + wt.loss,
data = lung)
summary(fullcox)
## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + ph.karno +
## pat.karno + meal.cal + wt.loss, data = lung)
##
## n= 168, number of events= 121
## (60 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 1.065e-02 1.011e+00 1.161e-02 0.917 0.35906
## sex -5.509e-01 5.765e-01 2.008e-01 -2.743 0.00609 **
## ph.ecog 7.342e-01 2.084e+00 2.233e-01 3.288 0.00101 **
## ph.karno 2.246e-02 1.023e+00 1.124e-02 1.998 0.04574 *
## pat.karno -1.242e-02 9.877e-01 8.054e-03 -1.542 0.12316
## meal.cal 3.329e-05 1.000e+00 2.595e-04 0.128 0.89791
## wt.loss -1.433e-02 9.858e-01 7.771e-03 -1.844 0.06518 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0107 0.9894 0.9880 1.0340
## sex 0.5765 1.7347 0.3889 0.8545
## ph.ecog 2.0838 0.4799 1.3452 3.2277
## ph.karno 1.0227 0.9778 1.0004 1.0455
## pat.karno 0.9877 1.0125 0.9722 1.0034
## meal.cal 1.0000 1.0000 0.9995 1.0005
## wt.loss 0.9858 1.0144 0.9709 1.0009
##
## Concordance= 0.651 (se = 0.029 )
## Likelihood ratio test= 28.33 on 7 df, p=2e-04
## Wald test = 27.58 on 7 df, p=3e-04
## Score (logrank) test = 28.41 on 7 df, p=2e-04
subcox <- coxph(Surv(time, status) ~ sex + ph.ecog, data = lung, subset=rowSums(is.na(lung[,2:10]))==0)
anova(fullcox, subcox)
## Analysis of Deviance Table
## Cox model: response is Surv(time, status)
## Model 1: ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss
## Model 2: ~ sex + ph.ecog
## loglik Chisq Df P(>|Chi|)
## 1 -498.75
## 2 -503.14 8.7792 5 0.1182
The data:
## ph.ecog ph.karno pat.karno
## ph.ecog 1.0000000 -0.8072667 -0.5112209
## ph.karno -0.8072667 1.0000000 0.5202974
## pat.karno -0.5112209 0.5202974 1.0000000
Females tend to live longer: here are predicted survival curves for males and females with all other variables set to their mean values. There is a predicted difference in survivorship of around 20%; the difference reflects that seen in the Kaplan-Meier curves above.
mean_vals <- as.data.frame(lapply(lung[,c("age", "ph.ecog", "ph.karno", "pat.karno", "meal.cal", "wt.loss")], mean, na.rm=TRUE))
fullcox_sex <- survfit(fullcox,
newdata=cbind(mean_vals, sex=c(1, 2)))
plot(fullcox_sex, lty=1, col=1:4, main='predicted survival, mean values')
legend("topright", lty=1, col=1:4, legend=c("males", "females"))
Now, here are predicted survival curves for females, separately by ECOG score, showing a huge difference by ECOG score: survivorship ranges from about 10% for ECOG score 0 to 80% for ECOG score 3, and around 40% of ECOG score 0 patients surviving past 1000 days.
In that model, we used numerical ECOG scores, assuming that the difference between scores 1 and 2 was the same as that between 2 and 3. If we change ECOG to a factor, we can see how well this holds up: it looks like the difference between ECOG 2 and 3 is bigger than between 0 and 1, although the predictions are not wildly different than in the simpler model.
newcox <- coxph(Surv(time, status)
~ age + sex + factor(ph.ecog) + ph.karno + pat.karno
+ meal.cal + wt.loss,
data = lung)
newcox_ecog <- survfit(newcox,
newdata=cbind(expand.grid(ph.ecog=0:3, sex=2),
mean_vals[,-2]))
plot(newcox_ecog, lty=1, col=1:4, main='predicted survival, females')
legend("topright", lty=1, col=1:4, legend=paste("ECOG=", 0:3))
Indeed, using an ANOVA to compare the models, we don’t get a strong indication that allowing separate effects of each ECOG level is making the model fit much better.
## Analysis of Deviance Table
## Cox model: response is Surv(time, status)
## Model 1: ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss
## Model 2: ~ age + sex + factor(ph.ecog) + ph.karno + pat.karno + meal.cal + wt.loss
## loglik Chisq Df P(>|Chi|)
## 1 -498.75
## 2 -498.52 0.463 2 0.7933