| Title: | Regression with Interval-Censored Covariates |
| Version: | 0.2.0 |
| Description: | Provides functions to simulate and analyze data for a regression model with an interval censored covariate, as described in Morrison et al. (2021) <doi:10.1111/biom.13472>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Imports: | biglm, dplyr (≥ 1.1.0), lubridate, stats, lobstr, arm, ggplot2, scales, plotly, rlang, ggrepel |
| Suggests: | spelling, rmarkdown, knitr, testthat (≥ 3.0.0), markdown, pander, vdiffr, withr, badger |
| Language: | en-US |
| URL: | https://d-morrison.github.io/rwicc/, https://github.com/d-morrison/rwicc |
| BugReports: | https://github.com/d-morrison/rwicc/issues |
| Depends: | R (≥ 4.1.0) |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-05-27 06:40:28 UTC; ezramorrison |
| Author: | Douglas Ezra Morrison
|
| Maintainer: | Douglas Ezra Morrison <demorrison@ucdavis.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-27 07:10:02 UTC |
rwicc: Regression with Interval-Censored Covariates
Description
Provides functions to simulate and analyze data for a regression model with an interval censored covariate, as described in Morrison et al. (2021) doi:10.1111/biom.13472.
Author(s)
Maintainer: Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]
Authors:
Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]
Ron Brookmeyer
See Also
Useful links:
Report bugs at https://github.com/d-morrison/rwicc/issues
Build table of event date possibilities
Description
Build table of event date possibilities
Usage
build_event_date_possibilities_table(
participant_level_data,
bin_width = 1,
omega_hat = build_omega_table(participant_level_data, bin_width = bin_width)
)
Arguments
participant_level_data |
a data.frame with columns:
|
bin_width |
number of days per bin |
omega_hat |
a data.frame from |
Value
a data.frame with columns:
ID: participant identifier
Stratum: indicator for which population stratum the participant belongs to
S: possible seroconversion dates
Examples
library(dplyr)
simulate_interval_censoring()$pt_data |>
mutate(Stratum = 1) |>
build_event_date_possibilities_table()
Build table of dates with possible seroconversions in dataset
Description
Build table of dates with possible seroconversions in dataset
Usage
build_omega_table(participant_level_data, bin_width = 1)
Arguments
participant_level_data |
a data.frame with columns:
|
bin_width |
number of days per bin |
Value
convert a pair of simple logistic regression coefficients into P(Y|T) curve:
Description
convert a pair of simple logistic regression coefficients into P(Y|T) curve:
Usage
build_phi_function_from_coefs(coefs)
Arguments
coefs |
numeric vector of coefficients |
Value
function(t) P(Y=1|T=t)
Validate participant-level data
Description
Internal helper function that performs basic validation checks on participant-level data to ensure data integrity before model fitting.
Usage
check_pt_data(participant_level_data)
Arguments
participant_level_data |
A data.frame or tibble containing participant-level data with at least the following columns:
|
Value
NULL (invisibly). The function throws an error if validation fails.
compute mean window period duration from simple logistic regression coefficients
Description
compute mean window period duration from simple logistic regression coefficients
Usage
compute_mu(theta)
Arguments
theta |
numeric vector of coefficients |
Value
numeric scalar: mean window period duration
Filter dataset by participant IDs
Description
Internal helper function that filters all data frames within a dataset structure to include only the specified participant IDs.
Usage
filter_data_by_ID(dataset, included_IDs)
Arguments
dataset |
A list containing data frames output from
|
included_IDs |
A character vector of participant IDs to retain in the filtered dataset. |
Value
A list with the same structure as dataset, but with all
data frames filtered to include only rows where the ID is in
included_IDs.
Fit a logistic regression model with an interval-censored covariate
Description
This function fits a logistic regression model for a binary outcome Y with an interval-censored covariate T, using an EM algorithm, as described in Morrison et al (2021); doi:10.1111/biom.13472.
Usage
fit_joint_model(
participant_level_data,
obs_level_data,
model_formula = stats::formula(Y ~ T),
mu_function = compute_mu,
bin_width = 1,
denom_offset = 0.1,
EM_toler_loglik = 0.1,
EM_toler_est = 1e-04,
EM_max_iterations = Inf,
glm_tolerance = 1e-07,
glm_maxit = 20,
initial_S_estimate_location = 0.25,
coef_change_metric = "max abs rel diff coefs",
verbose = FALSE
)
Arguments
participant_level_data |
a data.frame or tibble with the following variables:
|
obs_level_data |
a data.frame or tibble with the following variables:
|
model_formula |
the functional form for the regression model for p(y|t) (as a formula() object) |
mu_function |
a function taking a vector of regression coefficient estimates as input and outputting an estimate of mu (mean duration of MAA-positive infection). |
bin_width |
the number of days between possible seroconversion dates (should be an integer) |
denom_offset |
an offset value added to the denominator of the hazard estimates to improve numerical stability |
EM_toler_loglik |
the convergence cutoff for the log-likelihood criterion ("Delta_L" in the paper) |
EM_toler_est |
the convergence cutoff for the parameter estimate criterion ("Delta_theta" in the paper) |
EM_max_iterations |
the number of EM iterations to perform before giving up if still not converged. |
glm_tolerance |
the convergence cutoff for the glm fit in the M step |
glm_maxit |
the iterations cutoff for the glm fit in the M step |
initial_S_estimate_location |
determines how seroconversion date is guessed to initialize the algorithm; can be any decimal between 0 and 1; 0.5 = midpoint imputation, 0.25 = 1st quartile, 0 = last negative, etc. |
coef_change_metric |
a string indicating the type of parameter estimate criterion to use:
|
verbose |
whether to print algorithm progress details to the console |
Value
a list with the following elements:
-
Theta: the estimated regression coefficients for the model of p(Y|T) -
Mu: the estimated mean window period (a transformation ofTheta) -
Omega: a table with the estimated parameters for the model of p(S|E). -
converged: indicator of whether the algorithm reached its cutoff criteria before reaching the specified maximum iterations. 1 = reached cutoffs, 0 = not. -
iterations: the number of EM iterations completed before the algorithm stopped. -
convergence_metrics: the four convergence metrics -
convergence_stats: a table of the log-likelihood at each iteration
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi:10.1111/biom.13472.
Examples
## Not run:
# simulate data:
study_data <- simulate_interval_censoring()
# fit model:
EM_algorithm_outputs <- fit_joint_model(
obs_level_data = study_data$obs_data,
participant_level_data = study_data$pt_data
)
## End(Not run)
Fit model using midpoint imputation
Description
Fit model using midpoint imputation
Usage
fit_midpoint_model(
participant_level_data,
obs_level_data,
maxit = 1000,
tolerance = 1e-08
)
Arguments
participant_level_data |
a data.frame or tibble with the following variables:
|
obs_level_data |
a data.frame or tibble with the following variables:
|
maxit |
maximum iterations, passed to |
tolerance |
convergence criterion, passed to |
Value
a vector of logistic regression coefficient estimates
Examples
sim_data <- simulate_interval_censoring(
"theta" = c(0.986, -3.88),
"study_cohort_size" = 4500,
"preconversion_interval_length" = 365,
"hazard_alpha" = 1,
"hazard_beta" = 0.5
)
theta_est_midpoint <- fit_midpoint_model(
obs_level_data = sim_data$obs_data,
participant_level_data = sim_data$pt_data
)
Fit model using uniform imputation
Description
Fit model using uniform imputation
Usage
fit_uniform_model(
participant_level_data,
obs_level_data,
maxit = 1000,
tolerance = 1e-08,
n_imputations = 10
)
Arguments
participant_level_data |
a data.frame or tibble with the following variables:
|
obs_level_data |
a data.frame or tibble with the following variables:
|
maxit |
maximum iterations, passed to |
tolerance |
convergence criterion, passed to |
n_imputations |
number of imputed data sets to create |
Value
a vector of logistic regression coefficient estimates
Examples
sim_data <- simulate_interval_censoring(
"theta" = c(0.986, -3.88),
"study_cohort_size" = 4500,
"preconversion_interval_length" = 365,
"hazard_alpha" = 1,
"hazard_beta" = 0.5
)
theta_est_midpoint <- fit_uniform_model(
obs_level_data = sim_data$obs_data,
participant_level_data = sim_data$pt_data
)
Plot the estimated seroconversion-date distribution for one participant
Description
Plot the estimated seroconversion-date distribution for one participant
Usage
graph_S(subject_level_data_possibilities, id = 1)
Arguments
subject_level_data_possibilities |
a data.frame of per-subject seroconversion-date possibilities with their estimated probabilities |
id |
the participant ID to plot |
Value
Graph seroconversion hazard model
Description
Graph seroconversion hazard model
Usage
graph_omega(omega)
Arguments
omega |
a data.frame containing parameter values for the seroconversion hazard model |
Value
Examples
example_model <-
system.file("extdata", "example_model.rds", package = "rwicc") |>
readRDS()
omega_est_EM <- example_model$Omega
omega_est_EM |> graph_omega()
Plot simulated seroconversion density curves
Description
Plots example seroconversion-time densities for several intercept/slope hazard combinations.
Usage
graph_simulated_densities()
Value
Plot simulated seroconversion hazard curves
Description
Plots example linear seroconversion hazards for several intercept/slope combinations.
Usage
graph_simulated_hazards()
Value
Plot simulated seroconversion survival curves
Description
Plots example seroconversion-time survival curves for several intercept/slope hazard combinations.
Usage
graph_simulated_survival_curves()
Value
plot estimated and true CDFs for seroconversion date distribution
Description
plot estimated and true CDFs for seroconversion date distribution
Usage
plot_CDF(true_hazard_alpha, true_hazard_beta, omega_hat)
Arguments
true_hazard_alpha |
The data-generating hazard at the start of the study |
true_hazard_beta |
The change in data-generating hazard per calendar year |
omega_hat |
tibble of estimated discrete hazards |
Value
a ggplot
Examples
## Not run:
hazard_alpha <- 1
hazard_beta <- 0.5
study_data <- simulate_interval_censoring(
"hazard_alpha" = hazard_alpha,
"hazard_beta" = hazard_beta
)
# fit model:
EM_algorithm_outputs <- fit_joint_model(
obs_level_data = study_data$obs_data,
participant_level_data = study_data$pt_data
)
plot1 <- plot_CDF(
true_hazard_alpha = hazard_alpha,
true_hazard_beta = hazard_beta,
omega_hat = EM_algorithm_outputs$Omega
)
print(plot1)
## End(Not run)
Plot censoring data
Description
Plot censoring data
Usage
plot_censoring_data(
dataset,
included_IDs = unique(dataset$pt_data$ID),
label_size = 5,
point_size = 5,
s_vjust = 2,
labelled_IDs = included_IDs,
xmin = min(dataset$pt_data$E) - 28,
xmax = max(dataset$obs_data$O)
)
Arguments
dataset |
output from |
included_IDs |
|
label_size |
numeric:
passed to |
point_size |
numeric: passed to ggplot2::geom_point's |
s_vjust |
passed to ggrepel::geom_text_repel's |
labelled_IDs |
|
xmin |
minimum displayed value for x-axis |
xmax |
maximum displayed value for x-axis |
Value
a ggplot
Plot true and estimated curves for P(Y=1|T=t)
Description
Plot true and estimated curves for P(Y=1|T=t)
Usage
plot_phi_curves(
theta_true,
theta.hat_joint,
theta.hat_midpoint,
theta.hat_uniform
)
Arguments
theta_true |
the coefficients of the data-generating model P(Y=1|T=t) |
theta.hat_joint |
the estimated coefficients from the joint model |
theta.hat_midpoint |
the estimated coefficients from midpoint imputation |
theta.hat_uniform |
the estimated coefficients from uniform imputation |
Value
a ggplot
Examples
## Not run:
theta_true <- c(0.986, -3.88)
hazard_alpha <- 1
hazard_beta <- 0.5
sim_data <- simulate_interval_censoring(
"theta" = theta_true,
"study_cohort_size" = 4500,
"preconversion_interval_length" = 365,
"hazard_alpha" = hazard_alpha,
"hazard_beta" = hazard_beta
)
# extract the participant-level and observation-level simulated data:
sim_participant_data <- sim_data$pt_data
sim_obs_data <- sim_data$obs_data
rm(sim_data)
# joint model:
EM_algorithm_outputs <- fit_joint_model(
obs_level_data = sim_obs_data,
participant_level_data = sim_participant_data,
bin_width = 7,
verbose = FALSE
)
# midpoint imputation:
theta_est_midpoint <- fit_midpoint_model(
obs_level_data = sim_obs_data,
participant_level_data = sim_participant_data
)
# uniform imputation:
theta_est_uniform <- fit_uniform_model(
obs_level_data = sim_obs_data,
participant_level_data = sim_participant_data
)
plot2 <- plot_phi_curves(
theta_true = theta_true,
theta.hat_uniform = theta_est_uniform,
theta.hat_midpoint = theta_est_midpoint,
theta.hat_joint = EM_algorithm_outputs$Theta
)
print(plot2)
## End(Not run)
rwicc: Regression with Interval-Censored Covariates
Description
The rwicc package implements a regression model with an
interval-censored covariate using an EM algorithm, as described in
Morrison et al (2021); doi:10.1111/biom.13472.
rwicc functions
The main rwicc functions are:
Author(s)
Maintainer: Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]
Authors:
Douglas Ezra Morrison demorrison@ucdavis.edu (ORCID) [copyright holder]
Ron Brookmeyer
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi:10.1111/biom.13472.
See Also
Useful links:
Report bugs at https://github.com/d-morrison/rwicc/issues
Seroconversion cumulative hazard function
Description
Cumulative hazard obtained by integrating seroconversion_hazard_function().
Usage
seroconversion_cumhaz_function(t, intercept, slope, entry_time = 0)
Arguments
t |
numeric vector of times since study start (years) |
intercept |
hazard at study start |
slope |
change in hazard per year |
entry_time |
time of study entry; the hazard is zero before this |
Value
a numeric vector of cumulative hazard values
Seroconversion density function
Description
Probability density, survival * hazard.
Usage
seroconversion_density_function(...)
Arguments
... |
arguments passed to |
Value
a numeric vector of density values
Seroconversion hazard function
Description
Linear instantaneous hazard of seroconversion, intercept + slope * t,
equal to zero before entry_time.
Usage
seroconversion_hazard_function(t, intercept, slope, entry_time = 0)
Arguments
t |
numeric vector of times since study start (years) |
intercept |
hazard at study start |
slope |
change in hazard per year |
entry_time |
time of study entry; the hazard is zero before this |
Value
a numeric vector of hazard values
Inverse survival function for time-to-event variable with linear hazard function
Description
This function determines the seroconversion date corresponding to a provided probability of survival. See doi:10.1111/biom.13472, Supporting Information, Section A.4.
Usage
seroconversion_inverse_survival_function(u, e, hazard_alpha, hazard_beta)
Arguments
u |
a vector of seroconversion survival probabilities |
e |
a vector of time differences between study start and enrollment (in years) |
hazard_alpha |
the instantaneous hazard of seroconversion on the study start date |
hazard_beta |
the change in hazard per year after study start date |
Value
numeric vector of time differences between study start and seroconversion (in years)
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics, doi:10.1111/biom.13472.
Seroconversion survival function
Description
Survival probability, exp(-cumulative hazard).
Usage
seroconversion_survival_function(...)
Arguments
... |
arguments passed to |
Value
a numeric vector of survival probabilities
Simulate a dataset with interval-censored seroconversion dates
Description
simulate_interval_censoring generates a simulated data set from a
data-generating model based on the typical structure of a cohort study of HIV
biomarker progression, as described in Morrison et al (2021);
doi:10.1111/biom.13472.
Usage
simulate_interval_censoring(
study_cohort_size = 4500,
probability_of_ever_seroconverting = 0.05,
n_at_risk = stats::rbinom(n = 1, size = study_cohort_size, prob =
probability_of_ever_seroconverting),
hazard_alpha = 1,
hazard_beta = 0.5,
preconversion_interval_length = 84,
theta = c(0.986, -3.88),
years_in_study = 10,
max_scheduling_offset = 7,
days_from_study_start_to_recruitment_end = 365,
study_start_date = lubridate::ymd("2001-01-01")
)
Arguments
study_cohort_size |
the number of participants to simulate (N_0 in the paper) |
probability_of_ever_seroconverting |
the probability that each participant is at risk of HIV seroconversion |
n_at_risk |
number of participants who are at risk of infection; by
default, this number is determined stochastically from
|
hazard_alpha |
the hazard (instantaneous risk) of seroconversion at the start date of the cohort study for those participants at risk of seroconversion |
hazard_beta |
the change in hazard per calendar year |
preconversion_interval_length |
the number of days between tests for seroconversion |
theta |
the parameters of a logistic model (with linear functional from) specifying the probability of MAA-positive biomarkers as a function of time since seroconversion |
years_in_study |
the duration of follow-up for each participant |
max_scheduling_offset |
the maximum divergence of pre-seroconversion followup visits from the prescribed schedule |
days_from_study_start_to_recruitment_end |
the length of the recruitment period |
study_start_date |
the date when the study starts recruitment ("d_0" in the main text). The value of this parameter does not affect the simulation results; it is only necessary as a reference point for generating E, L, R, O, and S. |
Value
A list containing the following two tibbles:
-
pt_data: a tibble of participant-level information, with the following columns:-
ID: participant ID -
E: enrollment date -
L: date of last HIV test prior to seroconversion -
R: date of first HIV test after seroconversion
-
-
obs_data: a tibble of longitudinal observations with the following columns:-
ID: participant ID -
O: dates of biomarker sample collection -
Y: MAA classifications of biomarker samples
-
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi:10.1111/biom.13472.
Examples
study_data <- simulate_interval_censoring()
participant_characteristics <- study_data$pt_data
longitudinal_observations <- study_data$obs_data
Create a ggplot with standard theme settings
Description
Internal helper function that initializes a ggplot object with consistent theme and styling settings used across multiple plotting functions in the package.
Usage
standard_ggplot(data)
Arguments
data |
A data frame to be passed to |
Value
A ggplot2::ggplot() object with standard theme settings applied,
including a black and white theme, custom axis styling, and legend
positioning at the bottom.
Update subject-level data with posterior probabilities
Description
Internal helper function that updates subject-level data by computing various conditional probabilities used in the EM algorithm for fitting the joint model. This function combines observation-level predictions with participant-level data and omega estimates to compute posterior probabilities needed for parameter estimation.
Usage
update_possible_subj_data(
obs_data_possibilities,
MAA_model,
participant_level_data,
omega_hat
)
Arguments
obs_data_possibilities |
A data.frame containing all possible observation-level data combinations, including ID, S (possible seroconversion dates), and Y (MAA classifications). |
MAA_model |
A fitted |
participant_level_data |
A data.frame containing participant-level information, including ID, Stratum, and P(S>=l|E=e) (the probability that seroconversion occurs on or after the last negative test date). |
omega_hat |
A data.frame containing estimated parameters for the seroconversion date distribution model, including S, Stratum, P(S=s|S>=s,E=e), and P(S>s|S>=s,E=e). |
Value
A data.frame containing updated subject-level data with the following columns:
-
ID: Participant identifier -
Stratum: Stratification variable -
S: Possible seroconversion date -
P(Y=y|T=t): Probability of observed MAA outcomes given time since seroconversion -
P(S=s|E=e): Marginal probability of seroconversion at time s given enrollment date -
P(S=s|e,l,r,o,y): Posterior probability of seroconversion at time s given all observed data (used to estimate omega and theta) -
P(S>=s|e,l,r,o,y): Posterior probability that seroconversion occurs on or after time s (used to estimate omega)