Cox’s Proportional Hazards

Peter Ralph

5 January 2021 – Advanced Biological Statistics

Cox’s proportional hazards

Proportional Hazards

Goal: understand how covariates affect survival time.

Idea: modify a nonparametric hazard rate by a linear predictor: \[ h(t) = h_0(t) e^{\beta x} .\]

plot of chunk r plotcoxph

Covariates

How does survival probability depend on the covariates?

age:        Age in years                                                        
sex:        Male=1 Female=2                                                     
ph.ecog:    ECOG performance score as rated by the physician. 
            (good=0-bad=4)
ph.karno:   Karnofsky performance score (bad=0-good=100) rated by physician     
pat.karno:  Karnofsky performance score as rated by patient                     
meal.cal:   Calories consumed at meals                                          
wt.loss:    Weight loss in last six months

Covariates

How does survival probability depend on the covariates?

age:        Age in years                                                        
sex:        Male=1 Female=2                                                     
ph.ecog:    ECOG performance score as rated by the physician. 
            (good=0-bad=4)
ph.karno:   Karnofsky performance score (bad=0-good=100) rated by physician     
pat.karno:  Karnofsky performance score as rated by patient                     
meal.cal:   Calories consumed at meals                                          
wt.loss:    Weight loss in last six months

\[\begin{aligned} h(t) &= \text{(baseline hazard rate at $t$)} \\ y_i &= \beta_0 + \sum_j \beta_j x_{ij} \\ h_i(t) &= \text{(hazard rate for $i$ at $t$)} \\ &= h(t) \times \exp\left( y_i \right) \\ S_i(t) &= \P\left\{\text{survival of $i$ until $t$}\right\} \\ &= \exp\left( - \int_0^t h_i(s) ds \right) . \end{aligned}\]

Fitting a Cox model

fullcox <- coxph(Surv(time, status)
                     ~ age + sex + ph.ecog + ph.karno + pat.karno
                     + meal.cal + wt.loss,
                 data = lung)
summary(fullcox)

## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + ph.karno + 
##     pat.karno + meal.cal + wt.loss, data = lung)
## 
##   n= 168, number of events= 121 
##    (60 observations deleted due to missingness)
## 
##                 coef  exp(coef)   se(coef)      z Pr(>|z|)   
## age        1.065e-02  1.011e+00  1.161e-02  0.917  0.35906   
## sex       -5.509e-01  5.765e-01  2.008e-01 -2.743  0.00609 **
## ph.ecog    7.342e-01  2.084e+00  2.233e-01  3.288  0.00101 **
## ph.karno   2.246e-02  1.023e+00  1.124e-02  1.998  0.04574 * 
## pat.karno -1.242e-02  9.877e-01  8.054e-03 -1.542  0.12316   
## meal.cal   3.329e-05  1.000e+00  2.595e-04  0.128  0.89791   
## wt.loss   -1.433e-02  9.858e-01  7.771e-03 -1.844  0.06518 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## age          1.0107     0.9894    0.9880    1.0340
## sex          0.5765     1.7347    0.3889    0.8545
## ph.ecog      2.0838     0.4799    1.3452    3.2277
## ph.karno     1.0227     0.9778    1.0004    1.0455
## pat.karno    0.9877     1.0125    0.9722    1.0034
## meal.cal     1.0000     1.0000    0.9995    1.0005
## wt.loss      0.9858     1.0144    0.9709    1.0009
## 
## Concordance= 0.651  (se = 0.029 )
## Likelihood ratio test= 28.33  on 7 df,   p=2e-04
## Wald test            = 27.58  on 7 df,   p=3e-04
## Score (logrank) test = 28.41  on 7 df,   p=2e-04

Model comparison

subcox <- coxph(Surv(time, status) ~ sex + ph.ecog, data = lung, subset=rowSums(is.na(lung[,2:10]))==0)
anova(fullcox, subcox)

## Analysis of Deviance Table
##  Cox model: response is  Surv(time, status)
##  Model 1: ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss
##  Model 2: ~ sex + ph.ecog
##    loglik  Chisq Df P(>|Chi|)
## 1 -498.75                    
## 2 -503.14 8.7792  5    0.1182

Predictions

subcox_pred <- survfit(subcox, newdata=expand.grid(ph.ecog=0:3, sex=2))
plot(subcox_pred, lty=1, col=1:4, main='predicted survival, females')
legend("topright", lty=1, col=1:4, legend=paste("ECOG=", 0:3))

plot of chunk r prediction

In class: interpretation

The data:

168 patients with 121 observed survival times, over 1000 days.
(say more about the data)
Few (maybe no?) patients survived 1000 days.
Less than half survived to 50%.
More than half of males who survived to 1 year died in the following year.
Females more likely to survive longer than males: 20% higher survival rates until at least 800 days
50% of males have died by 1 year, while for females this is 1.5 years

Fitting a Cox’s PH model, sex and ECOG score are strongly predictive of survival times, while other variables have a weak effect, if at all.
ECOG and Karnofsky score are correlated, so ECOG seems to be the best predictor, with physician-reported Karnofsky score more reliable than patient-reported.

pairs(jitter(as.matrix(lung[,c("ph.ecog", "ph.karno", "pat.karno")])))

plot of chunk r correlation

cor(lung[,c("ph.ecog", "ph.karno", "pat.karno")], use='pairwise')

##              ph.ecog   ph.karno  pat.karno
## ph.ecog    1.0000000 -0.8072667 -0.5112209
## ph.karno  -0.8072667  1.0000000  0.5202974
## pat.karno -0.5112209  0.5202974  1.0000000

Females tend to live longer: here are predicted survival curves for males and females with all other variables set to their mean values. There is a predicted difference in survivorship of around 20%; the difference reflects that seen in the Kaplan-Meier curves above.

mean_vals <- as.data.frame(lapply(lung[,c("age", "ph.ecog", "ph.karno", "pat.karno", "meal.cal", "wt.loss")], mean, na.rm=TRUE))
fullcox_sex <- survfit(fullcox,
                        newdata=cbind(mean_vals, sex=c(1, 2)))
plot(fullcox_sex, lty=1, col=1:4, main='predicted survival, mean values')
legend("topright", lty=1, col=1:4, legend=c("males", "females"))

plot of chunk r prediction2

Now, here are predicted survival curves for females, separately by ECOG score, showing a huge difference by ECOG score: survivorship ranges from about 10% for ECOG score 0 to 80% for ECOG score 3, and around 40% of ECOG score 0 patients surviving past 1000 days.

fullcox_ecog <- survfit(fullcox,
                        newdata=cbind(expand.grid(ph.ecog=0:3, sex=2),
                                      mean_vals[,-2]))
plot(fullcox_ecog, lty=1, col=1:4, main='predicted survival, females')
legend("topright", lty=1, col=1:4, legend=paste("ECOG=", 0:3))

plot of chunk r prediction3

In that model, we used numerical ECOG scores, assuming that the difference between scores 1 and 2 was the same as that between 2 and 3. If we change ECOG to a factor, we can see how well this holds up: it looks like the difference between ECOG 2 and 3 is bigger than between 0 and 1, although the predictions are not wildly different than in the simpler model.

newcox <- coxph(Surv(time, status)
                     ~ age + sex + factor(ph.ecog) + ph.karno + pat.karno
                     + meal.cal + wt.loss,
                 data = lung)
newcox_ecog <- survfit(newcox,
                        newdata=cbind(expand.grid(ph.ecog=0:3, sex=2),
                                      mean_vals[,-2]))
plot(newcox_ecog, lty=1, col=1:4, main='predicted survival, females')
legend("topright", lty=1, col=1:4, legend=paste("ECOG=", 0:3))

plot of chunk r prediction4

Indeed, using an ANOVA to compare the models, we don’t get a strong indication that allowing separate effects of each ECOG level is making the model fit much better.

anova(fullcox, newcox)

## Analysis of Deviance Table
##  Cox model: response is  Surv(time, status)
##  Model 1: ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss
##  Model 2: ~ age + sex + factor(ph.ecog) + ph.karno + pat.karno + meal.cal + wt.loss
##    loglik Chisq Df P(>|Chi|)
## 1 -498.75                   
## 2 -498.52 0.463  2    0.7933