The ICH E9(R1) addendum proposed a framework for the formulation of scientific questions and treatment effects in clinical trials. The framework emphasized the importance of formulating precisely the scientific questions of interest in clinical trials in the form of an estimand. An estimand describes the treatment effect of interest via five attributes, which include treatment, population, variable (endpoint), intercurrent event and population-level summary.
Recent FDA guidance (2023) on Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products for Industry discussed various considerations when prospectively specifying covariate-adjusted analysis, including the distinction between unconditional vs. conditional average treatment effects.
This vignette provides
In general, an average treatment effect is the effect of moving a population from untreated (control treatment) to treated (experimental treatment). An average treatment effect can be estimated either with or without covariate adjustment.
Different flavours of average treatment effect exist, depending on the specific population that is moved from untreated to treated. For an overall superpopulation (denoted \(S\)), and for a given sample of patients in the trial (who are assumed to be randomly sampled from the superpopulation), we define the following potential estimands.
In the FDA guidance, and in other recent discussions on covariate adjustment, average treatment effects are often classified as either unconditional or conditional. (Sometimes marginal is used interchangeably with unconditional). In this context, what is usually meant by “conditional” is \(\mathrm{CATE}^S(x)\). That is, a subgroup specific average treatment effect. Often, an assumption is made that subgroup specific average treatment effects are identical across subgroups. If this assumption is wrong, then interpretation can be challenging. What is usually meant by “unconditional” (or “marginal”) is an average of subgroup-specific average treatment effects. This second averaging step gives corresponding estimation procedures a level of robustness to model misspecification. Hence the following distinction in the FDA guidance:
Under the coarse classification of average treatment effects as either “unconditional / marginal” or “conditional”, the \(\mathrm{PATE}^S\), \(\mathrm{CPATE}^S\) and \(\mathrm{SATE}\) can be viewed as “unconditional / marginal”, while the \(\mathrm{CATE}^S(x)\) can be viewed as “conditional”.
For a binary endpoint, an average treatment effect, whether it be a \(\mathrm{PATE}^S\), \(\mathrm{CPATE}^S\) or \(\mathrm{SATE}\), can be estimated from an adjusted logistic regression using the g-computation method. That is, by estimating the average counterfactual outcomes on each arm based on predictions from the working model. Based on these average counterfactual outcomes, different summary measure such as risk difference, relative risk or odds ratio can be evaluated.
Regardless of whether the estimand is \(\mathrm{PATE}^S\), \(\mathrm{CPATE}^S\) or \(\mathrm{SATE}\), the same point estimator can be used.
To avoid ambiguity regarding how the variance of the g-computation estimator should be calculated, it is necessary to be precise about whether one is estimating the \(\mathrm{PATE}^S\), \(\mathrm{CPATE}^S\) or \(\mathrm{SATE}\).
{beeca} implements variance estimation for \(\mathrm{PATE}^S\) based on the methods of Ye et al. (2023), and \(\mathrm{CPATE}^S\) based on Ge et al. (2011).
See Magirr et al. (2024) for discussion of the estimands \(\mathrm{PATE}^S\) and \(\mathrm{CPATE}^S\), and the corresponding methodology for variance estimation.
In the following example, we will use an example CDISC clinical trial
dataset in ADaM format (?trial02_cdisc
) with three
treatment arms. Below we show how to describe and perform an adjusted
analysis targeting marginal risk difference comparing the two active
arms against placebo.
A logistic regression model will be used to estimate the average treatment effect for Xanomeline High Dose vs Placebo and Xanomeline Low Dose vs Placebo. The model will adjust for baseline values of patient sex, race and age. Difference in marginal response proportions with p-value and corresponding 95% confidence interval will be estimated from the logistic regression model using the methodology described in Ge et al. 2011 with sandwich variance estimator (Liu et al. 2023). The sandwich estimator provides robustness to model-misspecification.
## Prepare the dataset for input
dat <- trial02_cdisc %>%
## Treatment variable must be coded as a factor
dplyr::mutate(TRTP = factor(TRTP))
## Fit the logistic regression model adjusting for SEX, RACE and AGE
fit <- glm(AVAL ~ TRTP + SEX + RACE + AGE, family = "binomial", data = dat)
## Calculate the marginal treatment effect estimate and associated variance for a difference contrast
## using the Ge et al. method with robust HC0 sandwich variance
marginal_fit <- get_marginal_effect(fit,
trt = "TRTP",
method = "Ge",
type = "HC0",
contrast = "diff",
reference = "Placebo"
)
## Warning in meatHC(x, type = type, omega = omega): HC0 covariances become (close
## to) singular if hat values are (close to) 1 as for observation(s) 24
The results are summarised in ARD format (Analysis Results Dataset) for ease of subsequent reporting.
## # A tibble: 19 × 8
## TRTVAR TRTVAL PARAM ANALTYP1 STAT STATVAL ANALMETH ANALDESC
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 TRTP Placebo AVAL DESCRIP… N 86 count Compute…
## 2 TRTP Placebo AVAL DESCRIP… n 29 count Compute…
## 3 TRTP Placebo AVAL DESCRIP… % 33.7 percent… Compute…
## 4 TRTP Placebo AVAL INFEREN… risk 0.343 g-compu… Compute…
## 5 TRTP Placebo AVAL INFEREN… risk… 0.0518 Ge - HC0 Compute…
## 6 TRTP Xanomeline High Dose AVAL DESCRIP… N 84 count Compute…
## 7 TRTP Xanomeline High Dose AVAL DESCRIP… n 61 count Compute…
## 8 TRTP Xanomeline High Dose AVAL DESCRIP… % 72.6 percent… Compute…
## 9 TRTP Xanomeline High Dose AVAL INFEREN… risk 0.716 g-compu… Compute…
## 10 TRTP Xanomeline High Dose AVAL INFEREN… risk… 0.0494 Ge - HC0 Compute…
## 11 TRTP Xanomeline Low Dose AVAL DESCRIP… N 84 count Compute…
## 12 TRTP Xanomeline Low Dose AVAL DESCRIP… n 62 count Compute…
## 13 TRTP Xanomeline Low Dose AVAL DESCRIP… % 73.8 percent… Compute…
## 14 TRTP Xanomeline Low Dose AVAL INFEREN… risk 0.743 g-compu… Compute…
## 15 TRTP Xanomeline Low Dose AVAL INFEREN… risk… 0.0460 Ge - HC0 Compute…
## 16 TRTP diff: Xanomeline High … AVAL INFEREN… diff 0.374 g-compu… Compute…
## 17 TRTP diff: Xanomeline High … AVAL INFEREN… diff… 0.0717 Ge - HC0 Compute…
## 18 TRTP diff: Xanomeline Low D… AVAL INFEREN… diff 0.400 g-compu… Compute…
## 19 TRTP diff: Xanomeline Low D… AVAL INFEREN… diff… 0.0694 Ge - HC0 Compute…
The results can then be passed on to the prespecified testing strategy as required. Below we provide a simple example where normal approximation is applied:
## Extract results
marginal_results <- marginal_fit$marginal_results
diff_est <- marginal_results[marginal_results$STAT == "diff", "STATVAL"][[1]]
diff_se <- marginal_results[marginal_results$STAT == "diff_se", "STATVAL"][[1]]
## 95% confidence interval
ci_l <- diff_est - (qnorm(0.975) * diff_se)
ci_u <- diff_est + (qnorm(0.975) * diff_se)
## Two-sided p-value
z_score <- diff_est / diff_se
p_value <- 2 * (1 - pnorm(abs(z_score)))
tibble(
Contrast = marginal_results[marginal_results$STAT == "diff", ]$TRTVAL,
`Risk difference` = round(diff_est, 2),
`95% CI` = sprintf("(%s - %s)", round(ci_l, 2), round(ci_u, 2)),
`p-value` = formatC(p_value, format = "e", digits = 2)
)
## # A tibble: 2 × 4
## Contrast `Risk difference` `95% CI` `p-value`
## <chr> <dbl> <chr> <chr>
## 1 diff: Xanomeline High Dose-Placebo 0.37 (0.23 - 0.51) 1.86e-07
## 2 diff: Xanomeline Low Dose-Placebo 0.4 (0.26 - 0.54) 8.02e-09
To illustrate the usage of {beeca} and for purposes of results validation, we compare {beeca} with other available implementations of Ge et al. (2011) and Ye et al. (2023) methods in R and SAS. We demonstrate equivalence of results while highlighting the simple user interface of {beeca}. Our lightweight package has been developed with a focus on facilitating quick industry adoption including compliance with GxP validation requirements with minimal dependencies.
Throughout the comparisons, we use the dataset trial01
included in the {beeca} package and focus on the risk difference
contrast.
# pre-process trial01 dataset to convert treatment arm to a factor and handle missing value
data01 <- trial01 |>
transform(trtp = as.factor(trtp)) |>
dplyr::filter(!is.na(aval))
fit <- glm(aval ~ trtp + bl_cov, family = "binomial", data = data01)
beeca_ge <- get_marginal_effect(object = fit, trt = "trtp", method = "Ge",
contrast = "diff", reference = "0", type = "model-based")
cat("Point estimate", beeca_ge$marginal_est, "\nStandard error estimate", beeca_ge$marginal_se)
## Point estimate -0.06836399
## Standard error estimate 0.06071641
Alternative version 1: code provided in Ge et al. (2011) paper. Note the corresponding SAS code is also available in the paper.
ge_var_paper <- function(glmfit, trt) {
pder <- function(ahat, vc, x) {
#### ahat: logistic regression parameters
#### vc: variance-covariance matrix of ahat
#### x: full model matrix of the logistic regression
#### return mean of phat on x and its se
phat <- plogis(x %*% ahat)
pbar <- mean(phat)
pderiv <- t(phat * (1 - phat)) %*% x / nrow(x)
sepbar <- sqrt(pderiv %*% vc %*% t(pderiv))
return(list(pbar = pbar, sepbar = sepbar, pderiv = pderiv))
}
difP <- function(glmfit) {
#### estimate the proportion difference and its standard error
df <- glmfit$model
vc <- vcov(glmfit)
df[, trt] <- 1
mat <- model.matrix(glmfit$formula, data = df)
pderT <- pder(coef(glmfit), vc, mat)
df[, trt] <- 0
mat <- model.matrix(glmfit$formula, data = df)
pderC <- pder(coef(glmfit), vc, mat)
difb <- pderT$pbar - pderC$pbar
sedif <- sqrt((pderT$pderiv - pderC$pderiv) %*% vc %*% t(pderT$pderiv - pderC$pderiv))
return(list(
pT = pderT$pbar,
pC = pderC$pbar,
dif = difb,
sedif = sedif,
var = sedif**2
))
}
return(list(est = difP(glmfit)$dif, se = difP(glmfit)$sedif[[1]]))
}
paper_ge <- ge_var_paper(fit, "trtp")
cat("Point estimate", paper_ge$est, "\nStandard error estimate", paper_ge$se)
## Point estimate -0.06836399
## Standard error estimate 0.06071641
Version 2: using {margins} package
if (requireNamespace("margins", quietly = T)) {
margins_ge <- margins::margins(model = fit, variables = "trtp", vcov = vcov(fit))
cat("Point estimate", summary(margins_ge)$AME, "\nStandard error estimate", summary(margins_ge)$SE)
}
## Point estimate -0.06836399
## Standard error estimate 0.06071641
Version 3: using {marginaleffects} package
if (requireNamespace("marginaleffects", quietly = T)) {
marginaleffects_ge <- marginaleffects::avg_comparisons(fit, variables = "trtp")
cat("Point estimate", marginaleffects_ge$estimate, "\nStandard error estimate", marginaleffects_ge$std.error)
}
Version 4: using SAS %Margins macro
%Margins(data = myWork.trial01,
class = trtp,
classgref = first, /*Set reference to first level*/
response = avaln,
roptions = event='1',
dist = binomial,
model = trtp bl_cov,
margins = trtp,
options = cl diff reverse,
link = logit);
** Store output data sets ;
data myWork.margins_trt_estimates;
set work._MARGINS;
run;
data myWork.margins_trt_diffs;
set work._DIFFSPM;
run;
cat("Point estimate", margins_trial01$Estimate, "\nStandard error estimate", margins_trial01$StdErr)
## Point estimate -0.06836399
## Standard error estimate 0.06071641
beeca_ye <- get_marginal_effect(object = fit, trt = "trtp", method = "Ye",
contrast = "diff", reference = "0")
cat("Point estimate", beeca_ye$marginal_est, "\nStandard error estimate", beeca_ye$marginal_se)
## Point estimate -0.06836399
## Standard error estimate 0.0608836
Version 1: from RobinCar
package
if (requireNamespace("RobinCar", quietly = T)) {
robincar_ye <- RobinCar::robincar_glm(data.frame(fit$data), response_col = as.character(fit$formula[2]),
treat_col = "trtp", formula = fit$formula, g_family = fit$family,
contrast_h = "diff")$contrast$result
cat("Point estimate", robincar_ye$estimate, "\nStandard error estimate", robincar_ye$se)
}
## Point estimate -0.06836399
## Standard error estimate 0.0608836
Arel-Bundock V (2023). marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests. R package version 0.14.0, https://marginaleffects.com/.
Bannick M, Ye T, Yi Y, Bian F (2024). RobinCar: Robust Estimation and Inference in Covariate-Adaptive Randomization. R package version 0.2.0, https://CRAN.R-project.org/package=RobinCar.
European Medicines Agency. 2020. “ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials.” https://www.ema.europa.eu/en/documents/scientific-guideline/ich-e9-r1-addendum-estimands-and-sensitivity-analysis-clinical-trials-guideline-statistical-principles-clinical-trials-step-5_en.pdf
FDA. 2023. “Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products. Final Guidance for Industry.” https://www.fda.gov/regulatory-information/search-fda-guidance-documents/adjusting-covariates-randomized-clinical-trials-drugs-and-biological-products
Ge, Miaomiao, L Kathryn Durham, R Daniel Meyer, Wangang Xie, and Neal Thomas. 2011. “Covariate-Adjusted Difference in Proportions from Clinical Trials Using Logistic Regression and Weighted Risk Differences.” Drug Information Journal: DIJ/Drug Information Association 45: 481–93.
Magirr, Dominic, Mark Baillie, Craig Wang, and Alexander Przybylski. 2024. “Estimating the Variance of Covariate-Adjusted Estimators of Average Treatment Effects in Clinical Trials with Binary Endpoints.” OSF. May 16. osf.io/9mp58.
SAS Institute Inc. “Predictive margins and average marginal effects.” https://support.sas.com/kb/63/038.html (Last Published: 13 Dec 2023)
Thomas J. Leeper (2021). margins: Marginal Effects for Model Objects. R package version 0.3.26.
Ye, Ting, Marlena Bannick, Yanyao Yi, and Jun Shao. 2023. Robust Variance Estimation for Covariate-Adjusted Unconditional Treatment Effect in Randomized Clinical Trials with Binary Outcomes.” Statistical Theory and Related Fields 7 (2): 159–63.