\[ %% % Add your macros here; they'll be included in pdf and html output. %% \newcommand{\R}{\mathbb{R}} % reals \newcommand{\E}{\mathbb{E}} % expectation \renewcommand{\P}{\mathbb{P}} % probability \DeclareMathOperator{\logit}{logit} \DeclareMathOperator{\logistic}{logistic} \DeclareMathOperator{\sd}{sd} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} \DeclareMathOperator{\Normal}{Normal} \DeclareMathOperator{\LogNormal}{logNormal} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\Beta}{Beta} \DeclareMathOperator{\Binom}{Binomial} \DeclareMathOperator{\Gam}{Gamma} \DeclareMathOperator{\Exp}{Exponential} \DeclareMathOperator{\Cauchy}{Cauchy} \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Dirichlet}{Dirichlet} \DeclareMathOperator{\Wishart}{Wishart} \DeclareMathOperator{\StudentsT}{StudentsT} \DeclareMathOperator{\Weibull}{Weibull} \newcommand{\given}{\;\vert\;} \]

Survival Analysis

Peter Ralph

7 January 2020 – Advanced Biological Statistics

Overview

“Survival” analysis:

We are interested in how long until some particular thing happens, and how this depends on some covariates.

Example: how years until death depends on cancer grade at diagnosis and drug treatment.

Example: how day of first budburst depends on species and local temperature.

Surivival data

Ex: For each patient, date of diagnosis and cancer grade; date of death or last follow-up.

Ex: For each plant, species, date of first budburst or last survey.

Both examples are right censored: for some subjects, we don’t know how the actual time, only a lower bound on it.

Key assumption: any censoring is noninformative,

i.e., our data collection does not depend on the status of the subjects.

Examples of informative censoring: patient dropout due to worsening symptoms.

What do we want to know?

The survival curve:

\[\begin{aligned} S(t) &= \P\{\text{still 'alive' after $t$ time units}\} , \end{aligned}\]

Note: this is always decreasing.

and the hazard rate:

\[\begin{aligned} h(t) &= \text{(mean number of 'deaths' per still-alive subject,} \\ &\qquad \text{per unit time at $t$)} , \end{aligned}\]

which is

\[\begin{aligned} h(t) = - \frac{d}{dt} \log S(t) . \end{aligned}\]

Nonparametric versus parametric?

We’ll look at some methods across the nonparametric-to-parametric continuum.

  • Nonparametric: fewer assumptions.
  • Parametric: fits a full, generative probability model.

Interpret the differences between these survival curves: plot of chunk exs

More information:

Datasets:

NCCTG Lung Cancer Data

NCCTG Lung Cancer Data

Description:

     Survival in patients with advanced lung cancer from the North
     Central Cancer Treatment Group.  Performance scores rate how well
     the patient can perform usual daily activities.

Usage:

     lung
     cancer
     
Format:

       inst:       Institution code                                                    
       time:       Survival time in days                                               
       status:     censoring status 1=censored, 2=dead                                 
       age:        Age in years                                                        
       sex:        Male=1 Female=2                                                     
       ph.ecog:    ECOG performance score as rated by the physician. 
                   0=asymptomatic, 1= symptomatic but completely ambulatory, 2= in bed 
                   <50% of the day, 3= in bed > 50% of the day but not bedbound, 4 = 
                   bedbound 
       ph.karno:   Karnofsky performance score (bad=0-good=100) rated by physician     
       pat.karno:  Karnofsky performance score as rated by patient                     
       meal.cal:   Calories consumed at meals                                          
       wt.loss:    Weight loss in last six months                                      
      
Note:

     The use of 1/2 for alive/dead instead of the usual 0/1 is a
     historical footnote.  For data contained on punch cards, IBM 360
     Fortran treated blank as a zero, which led to a policy within the
     section of Biostatistics to never use "0" as a data value since
     one could not distinguish it from a missing value.  The policy
     became a habit, as is often the case; and the 1/2 coding endured
     long beyond the demise of punch cards and Fortran.

Source:

     Terry Therneau

References:

     Loprinzi CL. Laurie JA. Wieand HS. Krook JE. Novotny PJ.  Kugler
     JW. Bartel J. Law M. Bateman M. Klatt NE. et al.  Prospective
     evaluation of prognostic variables from patient-completed
     questionnaires. North Central Cancer Treatment Group.  Journal of
     Clinical Oncology. 12(3):601-7, 1994.

Simulation: Constant hazard rate

Time until arrival of a high-energy neutrino in each of many detectors:

Simulation: Increasing hazard rate

Time until failure of lightbulbs, that wear out as time goes on:

Kaplan-Meier curves

Estimating probabilities

Suppose that:

  • we start with 100 chickens
  • in the first year, 25 of them die
  • in the second year, 25 of them die
  • at the end of the second year, we give 25 of the survivors away
  • in the third year, 15 of them die

Questions:

  • What percent survived each year? (of those that began that year alive)
  • Find the probability of surviving to 1, 2, 3, and 4 years,
  • and draw the survival curve.
  • Find a confidence interval for each probability.

The Kaplan-Meier survival curve is a purely empirical, nonparametric estimate of survival probability:

  • one interval per unique event time
  • estimated probability of surviving an interval is the proportion that did
  • and so the probability of surviving until \(t\) is the product of all the survival probabilities before \(t\)

Neutrinos:

plot of chunk plot_neutrinos

Light bulbs:

plot of chunk plot_lbs

Compared

plot of chunk plot_both

Lung cancer survival:

plot of chunk plot_km

Hazard rate is slope of survival on a log scale:

plot of chunk plot_km1

Survival, by sex:

plot of chunk plot_km2

Interpretation interlude:

Survival analysis is used to estimate the chance of surviving a given type of cancer for a given number of years.

Question: How accurate is that?

plot of chunk plot_km3

Cox’s proportional hazards

Cox’s orange pippin (credit: Wikimedia)
Cox’s orange pippin (credit: Wikimedia)

Proportional Hazards

Goal: understand how covariates affect survival time.

Idea: modify a nonparametric hazard rate by a linear predictor: \[ h(t) = h_0(t) e^{\beta x} .\]

plot of chunk plotcoxph

Covariates

How does survival probability depend on the covariates?

age:        Age in years                                                        
sex:        Male=1 Female=2                                                     
ph.ecog:    ECOG performance score as rated by the physician. 
            (good=0-bad=4)
ph.karno:   Karnofsky performance score (bad=0-good=100) rated by physician     
pat.karno:  Karnofsky performance score as rated by patient                     
meal.cal:   Calories consumed at meals                                          
wt.loss:    Weight loss in last six months     

Covariates

How does survival probability depend on the covariates?

age:        Age in years                                                        
sex:        Male=1 Female=2                                                     
ph.ecog:    ECOG performance score as rated by the physician. 
            (good=0-bad=4)
ph.karno:   Karnofsky performance score (bad=0-good=100) rated by physician     
pat.karno:  Karnofsky performance score as rated by patient                     
meal.cal:   Calories consumed at meals                                          
wt.loss:    Weight loss in last six months     

\[\begin{aligned} h(t) &= \text{(baseline hazard rate at $t$)} \\ y_i &= \beta_0 + \sum_j \beta_j x_{ij} \\ h_i(t) &= \text{(hazard rate for $i$ at $t$)} \\ &= h(t) \times \exp\left( y_i \right) \\ S_i(t) &= \P\left\{\text{survival of $i$ until $t$}\right\} \\ &= \exp\left( - \int_0^t h_i(s) ds \right) . \end{aligned}\]

Fitting a Cox model

## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + ph.karno + 
##     pat.karno + meal.cal + wt.loss, data = lung)
## 
##   n= 168, number of events= 121 
##    (60 observations deleted due to missingness)
## 
##                 coef  exp(coef)   se(coef)      z Pr(>|z|)   
## age        1.065e-02  1.011e+00  1.161e-02  0.917  0.35906   
## sex       -5.509e-01  5.765e-01  2.008e-01 -2.743  0.00609 **
## ph.ecog    7.342e-01  2.084e+00  2.233e-01  3.288  0.00101 **
## ph.karno   2.246e-02  1.023e+00  1.124e-02  1.998  0.04574 * 
## pat.karno -1.242e-02  9.877e-01  8.054e-03 -1.542  0.12316   
## meal.cal   3.329e-05  1.000e+00  2.595e-04  0.128  0.89791   
## wt.loss   -1.433e-02  9.858e-01  7.771e-03 -1.844  0.06518 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## age          1.0107     0.9894    0.9880    1.0340
## sex          0.5765     1.7347    0.3889    0.8545
## ph.ecog      2.0838     0.4799    1.3452    3.2277
## ph.karno     1.0227     0.9778    1.0004    1.0455
## pat.karno    0.9877     1.0125    0.9722    1.0034
## meal.cal     1.0000     1.0000    0.9995    1.0005
## wt.loss      0.9858     1.0144    0.9709    1.0009
## 
## Concordance= 0.651  (se = 0.029 )
## Likelihood ratio test= 28.33  on 7 df,   p=2e-04
## Wald test            = 27.58  on 7 df,   p=3e-04
## Score (logrank) test = 28.41  on 7 df,   p=2e-04

Model comparison

## Analysis of Deviance Table
##  Cox model: response is  Surv(time, status)
##  Model 1: ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss
##  Model 2: ~ sex + ph.ecog
##    loglik  Chisq Df P(>|Chi|)
## 1 -498.75                    
## 2 -503.14 8.7792  5    0.1182

Predictions

plot of chunk prediction

In class: interpretation

  • physicians evaluation matters
    • lowest to highest ECOG: 20% to 80% survival at one year
  • sex and ECOG have large, significant effects
    • males have lower survival
    • 50% lower survival rate for males
  • higher weight loss => higher survival rate, strangely
    • but not significant
  • age, caloric intake, patient assessment not significant
    • specifics?
  • institutional differences
    • specifics?

Stochastic minute: Weibull

The Weibull distribution

If \(T\) has a Weibull distribution, with scale \(\lambda\) and shape \(k\), then

  • \(T \ge 0\)
  • \(\P\{ T > t \} = \exp\left(- (t/\lambda)^k \right)\)
  • the mean is proportional to the scale: \(\E[T] = \lambda \times \Gamma(1 + 1/k)\)

It is mostly used in survival analysis, because its hazard rate is: \[\begin{aligned} h(t) = k \frac{1}{\lambda} \left(\frac{t}{\lambda}\right)^{k-1} . \end{aligned}\] which allows rates to go down (\(k<1\)), up (\(k>1\)), or stay constant (\(k=1\)) over time.

plot of chunk plot_wei

Fully parametric models

Modeling

The idea: choose a (parametric) distribution for the time-until-event.

Some choices:

  • Exponential (constant rate)
  • Weibull
  • Gompertz
  • logNormal

Model:

  • lifetimes are Weibull distributed
  • mean lifetime depends on covariates

Model:

  • lifetimes are Weibull distributed
  • mean lifetime depends on covariates

\[\begin{aligned} T_i &\sim \Weibull(\text{shape}=k, \text{mean}=\mu_i) \\ \mu_i &= \exp(y_i) \\ y_i &= \beta_0 + \sum_j \beta_j x_{ij} \end{aligned}\]

Three ways to do parametric survival analysis:

  • survival::survreg : maximum likelihood
  • brms::brm : Baysian, with Stan
  • MC Stan

Maximum likelihood: survreg

Bayesian: brms

## Compiling the C++ model
## Start sampling


Call:
survreg(formula = Surv(time, status) ~ sex + ph.ecog + ph.karno, 
    data = lung, dist = "weibull")
               Value Std. Error     z       p
(Intercept)  6.66434    0.65785 10.13 < 2e-16
sex          0.41141    0.12287  3.35 0.00081
ph.ecog     -0.47729    0.12618 -3.78 0.00016
ph.karno    -0.00906    0.00675 -1.34 0.17983
Log(scale)  -0.32452    0.06207 -5.23 1.7e-07

Scale= 0.723 

Weibull distribution
Loglik(model)= -1126.3   Loglik(intercept only)= -1141.1
    Chisq= 29.49 on 3 degrees of freedom, p= 1.8e-06 
Number of Newton-Raphson Iterations: 5 
n=226 (2 observations deleted due to missingness)
 Family: weibull 
  Links: mu = log; shape = identity 
Formula: time | cens(censored) ~ sex + ph.ecog + ph.karno 
   Data: lung (Number of observations: 226) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     6.61      0.67     5.36     7.99 1.00     2739     2666
sex           0.42      0.12     0.18     0.67 1.00     3502     2945
ph.ecog      -0.49      0.13    -0.74    -0.24 1.00     2531     2818
ph.karno     -0.01      0.01    -0.02     0.00 1.00     2644     2799

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     1.36      0.08     1.20     1.53 1.00     3708     3039

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects, with brms

## Compiling the C++ model
## Start sampling

##  Family: weibull 
##   Links: mu = log; shape = identity 
## Formula: time | cens(censored) ~ sex + ph.ecog + ph.karno + (1 | inst) 
##    Data: lung (Number of observations: 226) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~inst (Number of levels: 19) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.12      0.08     0.01     0.31 1.00     1639     1858
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     6.63      0.67     5.36     7.95 1.00     3514     3133
## sex           0.42      0.13     0.18     0.67 1.00     5396     2613
## ph.ecog      -0.50      0.13    -0.76    -0.25 1.00     3342     3061
## ph.karno     -0.01      0.01    -0.02     0.00 1.00     3398     3148
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     1.37      0.09     1.20     1.54 1.00     4942     2869
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

## $inst
## , , Intercept
## 
##         Estimate Est.Error        Q2.5     Q97.5
## 1   -0.077081047 0.1121902 -0.34453070 0.1052658
## 2   -0.051925091 0.1448818 -0.40060675 0.1929411
## 3   -0.003140926 0.1052136 -0.22609347 0.2210849
## 4    0.017833583 0.1226034 -0.22299699 0.3069706
## 5   -0.038681462 0.1353150 -0.35932986 0.2186037
## 6   -0.042325779 0.1144701 -0.31423089 0.1645995
## 7    0.025663604 0.1249884 -0.21553215 0.3238278
## 10  -0.046230231 0.1427945 -0.37326527 0.2203388
## 11   0.039240593 0.1136011 -0.17209673 0.3056447
## 12  -0.053260802 0.1130651 -0.31811782 0.1354898
## 13   0.047404908 0.1136277 -0.15599395 0.3150589
## 15   0.007206870 0.1268800 -0.25446182 0.2833138
## 16   0.061820871 0.1195758 -0.13391314 0.3491895
## 21  -0.066473590 0.1286672 -0.37756584 0.1363838
## 22   0.083172155 0.1223941 -0.09947608 0.3754763
## 26   0.039314096 0.1300442 -0.19835246 0.3580281
## 32   0.007893967 0.1329952 -0.27541795 0.3088573
## 33   0.006010918 0.1419936 -0.30783839 0.3107077
## 100  0.014532219 0.1370704 -0.26147744 0.3371779

Application

Fit (and select) a proportional hazards model to the veteran dataset. What are the most important predictors of survival?

Veterans' Administration Lung Cancer study

Description:

     Randomised trial of two treatment regimens for lung cancer.  This
     is a standard survival analysis data set.

Usage:

     veteran
     
Format:

       trt:       1=standard 2=test                            
       celltype:  1=squamous,  2=smallcell,  3=adeno,  4=large 
       time:      survival time                                
       status:    censoring status                             
       karno:     Karnofsky performance score (100=good)       
       diagtime:  months from diagnosis to randomisation       
       age:       in years                                     
       prior:     prior therapy 0=no, 10=yes      

Overview of brms

Fitting models

brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = gaussian(link='identity'))
  • formula: almost just like lme4
  • data: must contain all the variables
  • family: distribution of response
  • link: connects mean to linear predictor

Parameterization

There are several classes of parameter in a brms model:

  • b : the population-level (or, fixed) effects
  • sd : the standard deviations of group-level (or, random) effects
  • family-specific parameters, like sigma for the Gaussian

Examples:

  • b_x : the slope of x : class="b", coef="x"
  • sd_f : the SD of effects for levels of f : class="sd", coef="f"

Setting priors

Pass a vector of “priors”, specified by

    set_prior(prior, class="b", ...)

where prior is some valid Stan code.

brm(formula = z ~ x + y + (1 + y|f), data = xy,
    family = gaussian(link='identity'),
    prior=c(set_prior("normal(0, 5)", class="b"),
            set_prior("cauchy(0, 1)", class="sd", coef="f")))

1. Set up the formula

2. Choose priors

Default:

##                  prior     class      coef group resp dpar nlpar bound
## 1                              b                                      
## 2                              b         x                            
## 3                              b         y                            
## 4               lkj(1)       cor                                      
## 5                            cor               f                      
## 6  student_t(3, 1, 10) Intercept                                      
## 7  student_t(3, 0, 10)        sd                                      
## 8                             sd               f                      
## 9                             sd Intercept     f                      
## 10                            sd         y     f                      
## 11 student_t(3, 0, 10)     sigma

3. Fit the model

## Compiling the C++ model
## Start sampling

4. Check converence

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: z ~ x + y + (1 + y | f) 
##    Data: xy (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~f (Number of levels: 3) 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)        0.07      0.19     0.00     0.43 1.00      978     1165
## sd(y)                1.44      0.66     0.59     3.11 1.01      803     1031
## cor(Intercept,y)    -0.12      0.56    -0.95     0.90 1.00     1051     1277
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.02      0.10    -0.14     0.08 1.00     1842     1471
## x             1.00      0.01     0.98     1.01 1.00     2405     2051
## y             1.93      0.87     0.05     3.64 1.01      599      455
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.09      0.01     0.08     0.10 1.00     2735     2410
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Or…

launch_shinystan(the_fit)

4. Look at results

Summaries of, or samples from, the posteriors of:

  • fixef( ): “fixed” effects
  • ranef( ): “random” effects
  • fitted( ): mean of responses
  • predict( ): actual responses
  • hypothesis( ): functions of various parameters (e.g., difference between two classes)
  • marginal_effects( ): effect of one predictor conditioned on values of others

More info:

  • formulas: help(brmsformula)
  • families: help(brmsfamily) (but note can use those in help(family) also)
  • priors: help(set_prior) and also check what can have a prior with get_prior( )
  • get the Stan code: stancode(the_fit) (and standata(the_fit))
  • compare models with loo( )
  • more technical notes at this tutorial

Review

Nonparametric:

  • simple
  • does not depend on model fit

Parametric:

  • makes more sense: real hazard rates are not piecewise constant
  • better power, if the model fits
  • better extrapolation (but beware!)
  • can simulate

The Kaplan-Meier curve, by hand:

plot of chunk byhand