Introduction to the landmaRk package

Overview

The landmaRk package provides a framework for landmarking analysis of time-to-event and longitudinal data. It allows users to perform dynamic risk prediction for time-to-event outcomes whilst taking into account longitudinal measurements (e.g.Β biomarkers measured over time). Landmarking consists of a two-step framework. First, fitting models to the time-varying covariates, and then using these predictions in survival models.

Given a time-to-event outcome \(T_i\), a landmark time \(s\) and a time horizon \(s + w\), the goal of a landmarking analysis is to estimate \[ \pi_i(s+w \mid s) = P(T_i > s+w \mid T_i \ge s, 𝒀_i(s)), \] where \(𝒀_i(s)\) denotes a vector of covariates which may include time-varying covariates.

A landmarking analysis of time-to-event data has two components:

The landmaRk package allows users to use the following method for the first component:

For the second component, at present the landmaRk package supports Cox proportional hazard models as implemented in the survivalpackage.

Additionally, the landmaRk package provides a modular system allowing making it possible to incorporate additional models both for the longitudinal and the survival components

Diagram of the landmaRk package pipeline

Diagram of the landmaRk package pipeline

Setup

In addition to the landmaRk package, we will also use tidyverse.

set.seed(123)
library(landmaRk)
library(lcmm)
#> Registered S3 method overwritten by 'lcmm':
#>   method      from  
#>   plot.cuminc cmprsk
library(tidyverse)
library(prodlim)

Example: aids data

In this vignette, we use the dataset aids to perform landmarking analysis of time-to-event data with time-varying covariates. Here is the structure of the dataset.

library(JMbayes2)
data("aids")
aids$patient <- as.numeric(aids$patient)
str(aids)
#> 'data.frame':    1405 obs. of  12 variables:
#>  $ patient: num  1 1 1 2 2 2 2 3 3 3 ...
#>  $ Time   : num  17 17 17 19 19 ...
#>  $ death  : int  0 0 0 0 0 0 0 1 1 1 ...
#>  $ CD4    : num  10.68 8.43 9.43 6.32 8.12 ...
#>  $ obstime: int  0 6 12 0 6 12 18 0 2 6 ...
#>  $ drug   : Factor w/ 2 levels "ddC","ddI": 1 1 1 2 2 2 2 2 2 2 ...
#>  $ gender : Factor w/ 2 levels "female","male": 2 2 2 2 2 2 2 1 1 1 ...
#>  $ prevOI : Factor w/ 2 levels "noAIDS","AIDS": 2 2 2 1 1 1 1 2 2 2 ...
#>  $ AZT    : Factor w/ 2 levels "intolerance",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ start  : int  0 6 12 0 6 12 18 0 2 6 ...
#>  $ stop   : num  6 12 17 6 12 ...
#>  $ event  : num  0 0 0 0 0 0 0 0 0 1 ...

The dataset contains the following variables:

Initialising the landmarking analysis

First, we split the dataset into two, one containing static covariates, event time and indicator of event/censoring, and another one containing dynamic covariates. To that end, we use the function split_wide_df. That function returns a named list with the following elements:

The above split reduces data storage requirements, particularly for large datasets with a large number of individuals or longitudinal measurements. This is because static covariate values are stored only once per individual, rather than repeatedly for each longitudinal measurement.

# DF with Static covariates
aids_dfs <- split_wide_df(
  aids,
  ids = "patient",
  times = "obstime",
  static = c("Time", "death", "drug", "gender", "prevOI"),
  dynamic = c("CD4"),
  measurement_name = "value"
)
static <- aids_dfs$df_static
head(static)
#>    patient  Time death drug gender prevOI
#> 1        1 16.97     0  ddC   male   AIDS
#> 4        2 19.00     0  ddI   male noAIDS
#> 8        3 18.53     1  ddI female   AIDS
#> 11       4 12.70     0  ddC   male   AIDS
#> 15       5 15.13     0  ddI   male   AIDS
#> 19       6  1.90     1  ddC female   AIDS
# DF with Dynamic covariates
dynamic <- aids_dfs$df_dynamic
head(dynamic[["CD4"]])
#>   patient obstime     value
#> 1       1       0 10.677078
#> 2       1       6  8.426150
#> 3       1      12  9.433981
#> 4       2       0  6.324555
#> 5       2       6  8.124038
#> 6       2      12  4.582576

We can now create an object of class LandmarkAnalysis, using the helper function of the same name.

landmarking_object <- LandmarkAnalysis(
  data_static = static,
  data_dynamic = dynamic,
  event_indicator = "death",
  ids = "patient",
  event_time = "Time",
  times = "obstime",
  measurements = "value"
)

Arguments to the helper function are the following:

Baseline survival analysis (without time-varying covariates)

First, we perform a survival without time-varying covariates. We can use this as a baseline to evaluate the performance of a subsequent landmark analysis with such covariates. First step is to establish the landmark times, and to work out the risk sets at each of those landmark times.

landmarking_object <- landmarking_object |>
  compute_risk_sets(landmarks = c(6, 8))

landmarking_object
#> Summary of LandmarkAnalysis Object:
#>   Landmarks: 6 8 
#>   Number of observations: 467 
#>   Event indicator: death 
#>   Event time: Time 
#>   Risk sets: 
#>     Landmark 6: 404 subjects
#>     Landmark 8: 379 subjects

Now we use the function fit_survival to fit the survival model. We specify the following arguments:

Then, we can make predictions using the function predict_survival. Arguments method, landmarks and horizonsare as above.

landmarking_object <- landmarking_object |>
  fit_survival(
    formula = Surv(event_time, event_status) ~ drug,
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    method = "coxph",
    dynamic_covariates = c()
  ) |>
  predict_survival(
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8)
  )

To display the results, one can use the method summary, specifying type = "survival", in addition to a landmark and a horizon.

summary(landmarking_object, type = "survival", landmark = 6, horizon = 18)
#> Call:
#> survival::coxph(formula = formula, data = x@survival_datasets[[paste0(landmarks, 
#>     "-", horizons)]], model = TRUE, x = TRUE)
#> 
#>           coef exp(coef) se(coef)    z      p
#> drugddI 0.3123    1.3665   0.1785 1.75 0.0802
#> 
#> Likelihood ratio test=3.08  on 1 df, p=0.07918
#> n= 404, number of events= 127

Now the performance_metrics function can be used to calculate (for now, in-sample) performance metrics.

performance_metrics(
  landmarking_object,
  landmarks = c(6, 8),
  horizons = c(18, 20),
  auc_t = TRUE, c_index = FALSE,
  h_times = c(3, 6, 12)
)
#>      landmark horizon   Brier(9) Brier(12) Brier(18)    AUC(3)    AUC(6)
#> 6-18        6      18 0.07900671 0.1651251 0.2342406 0.5437755 0.5539229
#> 8-20        8      20 0.10044734 0.1701394 0.2442496 0.5357143 0.5626498
#>        AUC(12)
#> 6-18 0.5022860
#> 8-20 0.4582525

Landmarking analysis with lme4 + coxph

Now we use the package lme4 to fit a linear mixed model of the time-varying covariate, CD4. This first step is followed by fitting a Cox PH sub-model using the longitudinal predictions as covariates.

landmarking_object <- LandmarkAnalysis(
  data_static = static,
  data_dynamic = dynamic,
  event_indicator = "death",
  ids = "patient",
  event_time = "Time",
  times = "obstime",
  measurements = "value"
)
landmarking_object <- landmarking_object |>
  compute_risk_sets(landmarks = c(6, 8))

Provided with the risk sets, now the pipeline has the following four steps:

landmarking_object <- landmarking_object |>
  fit_longitudinal(
    landmarks = c(6, 8),
    method = "lme4",
    formula = value ~ prevOI + obstime + (obstime | patient),
    dynamic_covariates = c("CD4")
  ) |>
  predict_longitudinal(
    landmarks = c(6, 8),
    method = "lme4",
    dynamic_covariates = c("CD4")
  ) |>
  fit_survival(
    formula = Surv(event_time, event_status) ~ drug,
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    method = "coxph",
    dynamic_covariates = c("CD4")
  ) |>
  predict_survival(
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8)
  )

As before, one can also use the function summary to display the results.

summary(landmarking_object,
        type = "longitudinal",
        landmark = 6,
        dynamic_covariate = "CD4")
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: value ~ prevOI + obstime + (obstime | patient)
#>    Data: dataframe
#> REML criterion at convergence: 5308.102
#> Random effects:
#>  Groups   Name        Std.Dev. Corr
#>  patient  (Intercept) 3.9911       
#>           obstime     0.2094   0.00
#>  Residual             1.6903       
#> Number of obs: 1060, groups:  patient, 404
#> Fixed Effects:
#> (Intercept)   prevOIAIDS      obstime  
#>     10.4443      -4.5307      -0.1785  
#> optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
summary(landmarking_object, type = "survival", landmark = 6, horizon = 18)
#> Call:
#> survival::coxph(formula = formula, data = x@survival_datasets[[paste0(landmarks, 
#>     "-", horizons)]], model = TRUE, x = TRUE)
#> 
#>           coef exp(coef) se(coef)    z      p
#> drugddI 0.3123    1.3665   0.1785 1.75 0.0802
#> 
#> Likelihood ratio test=3.08  on 1 df, p=0.07918
#> n= 404, number of events= 127

Here are the performance metrics:

performance_metrics(
  landmarking_object,
  landmarks = c(6, 8),
  horizons = c(18, 20),
  auc_t = TRUE, c_index = FALSE,
  h_times = c(3, 6, 12)
)
#>      landmark horizon   Brier(9) Brier(12) Brier(18)    AUC(3)    AUC(6)
#> 6-18        6      18 0.07900671 0.1651251 0.2342406 0.5437755 0.5539229
#> 8-20        8      20 0.10044734 0.1701394 0.2442496 0.5357143 0.5626498
#>        AUC(12)
#> 6-18 0.5022860
#> 8-20 0.4582525

Landmarking analysis with lcmm + coxph

Now we use the lcmm package to fit a latent class mixed model of the time-varying covariate, CD4. This first step is followed by fitting a Cox PH sub-model using the longitudinal predictions as covariates.

The pipeline is identical to that of the lme4+coxph model. However, this time one has to specify method = "lcmm" when calling fit_longitudinal, in addition to the arguments that will be passed on to lcmm::hlme(), namely

landmarking_object <- LandmarkAnalysis(
  data_static = static,
  data_dynamic = dynamic,
  event_indicator = "death",
  ids = "patient",
  event_time = "Time",
  times = "obstime",
  measurements = "value"
)
landmarking_object <- landmarking_object |>
  compute_risk_sets(landmarks = c(6, 8)) |>
  fit_longitudinal(
    landmarks = c(6, 8),
    method = "lcmm",
    formula = value ~ obstime + prevOI,
    mixture = ~ obstime + prevOI,
    random = ~ obstime,
    subject = "patient",
    ng = 2,
    dynamic_covariates = c("CD4"),
    maxiter = 5000, rep = 25, nwg = TRUE
  ) |>
  predict_longitudinal(
    landmarks = c(6, 8),
    method = "lcmm",
    avg = TRUE,
    include_clusters = FALSE,
    var.time = "obstime",
    dynamic_covariates = c("CD4")
  ) |>
  fit_survival(
    formula = Surv(event_time, event_status) ~ drug,
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    method = "coxph",
    dynamic_covariates = c("CD4"),
    include_clusters = FALSE
  ) |>
  predict_survival(
    landmarks = c(6, 8),
    horizons = 12 + c(6, 8),
    dynamic_covariates = c("CD4"),
    include_clusters = FALSE
  )
summary(landmarking_object,
        type = "longitudinal",
        landmark = 6,
        dynamic_covariate = "CD4")
#> Heterogenous linear mixed model 
#>      fitted by maximum likelihood method 
#>  
#> hlme(fixed = value ~ obstime + prevOI, mixture = ~obstime + prevOI, 
#>     random = ~obstime, subject = "patient", classmb = ~1, ng = 2, 
#>     nwg = TRUE, maxiter = 24000, returndata = TRUE)
#>  
#> Statistical Model: 
#>      Dataset: NULL 
#>      Number of subjects: 404 
#>      Number of observations: 1060 
#>      Number of latent classes: 2 
#>      Number of parameters: 12  
#>  
#> Iteration process: 
#>      Convergence criteria satisfied 
#>      Number of iterations:  1 
#>      Convergence criteria: parameters= 7.5e-11 
#>                          : likelihood= 3.2e-10 
#>                          : second derivatives= 5.2e-11 
#>  
#> Goodness-of-fit statistics: 
#>      maximum log-likelihood: -2578.12  
#>      AIC: 5180.24  
#>      BIC: 5228.26  
#>  
#>  
#> Maximum Likelihood Estimates: 
#>  
#> Fixed effects in the class-membership model:
#> (the class of reference is the last class) 
#> 
#>                      coef      Se   Wald p-value
#> intercept class1  0.02129 0.17518  0.122 0.90325
#> 
#> Fixed effects in the longitudinal model:
#> 
#>                       coef      Se   Wald p-value
#> intercept class1   5.38287 0.31902 16.873 0.00000
#> intercept class2  13.51658 0.49321 27.406 0.00000
#> obstime class1    -0.16531 0.03393 -4.873 0.00000
#> obstime class2    -0.19030 0.04637 -4.104 0.00004
#> prevOIAIDS class1 -1.44437 0.31863 -4.533 0.00001
#> prevOIAIDS class2 -4.82749 0.68208 -7.078 0.00000
#> 
#> 
#> Variance-covariance matrix of the random-effects:
#>           intercept obstime
#> intercept  13.60172        
#> obstime    -0.24981 0.16909
#> 
#>                                     coef      Se
#> Proportional coefficient class1  0.33917 0.04171
#> Residual standard error:         1.54912 0.05467
summary(landmarking_object, type = "survival", landmark = 6, horizon = 18)
#> Call:
#> survival::coxph(formula = formula, data = x@survival_datasets[[paste0(landmarks, 
#>     "-", horizons)]], model = TRUE, x = TRUE)
#> 
#>           coef exp(coef) se(coef)    z      p
#> drugddI 0.3123    1.3665   0.1785 1.75 0.0802
#> 
#> Likelihood ratio test=3.08  on 1 df, p=0.07918
#> n= 404, number of events= 127
performance_metrics(
  landmarking_object,
  landmarks = c(6, 8),
  horizons = c(18, 20),
  auc_t = TRUE, c_index = FALSE,
  h_times = c(3, 6, 12)
)
#>      landmark horizon   Brier(9) Brier(12) Brier(18)    AUC(3)    AUC(6)
#> 6-18        6      18 0.07900671 0.1651251 0.2342406 0.5437755 0.5539229
#> 8-20        8      20 0.10044734 0.1701394 0.2442496 0.5357143 0.5626498
#>        AUC(12)
#> 6-18 0.5022860
#> 8-20 0.4582525