Department of Biostatistics, Fielding School of Public Health,
University of California, Los Angeles, CA, USA
Department of Computational Biomedicine,
Cedars-Sinai Medical Center, Los Angeles, CA,
USA
The TemporalForest package provides a reproducible method for feature selection in high-dimensional longitudinal data. It combines network analysis, mixed-effects models, and stability selection to identify robust predictors over time. This vignette offers a quick start guide to using the package.
Longitudinal ’omics studies, where subjects are measured repeatedly
over time, present unique challenges for feature selection: high
dimensionality, temporal dependence, and complex correlations. The
TemporalForest algorithm addresses these by creating a
robust, multi-stage pipeline that identifies features which are both
predictive and stable across resamples.
Since the package is not yet on CRAN, you can install the development version from GitHub:
This example walks you through a complete analysis with a small, simulated dataset.
This tiny demo is designed to always return all true signals quickly
(1–3s). We will simulate a dataset with 60 subjects, 2 time points, and
20 potential predictors. We will inject 3 true signals
into the outcome \(Y\), coming from
predictors V1, V2, and V3. To
ensure the example is fast and reliable for CRAN, we will pass a
precomputed dissimilarity matrix to skip Stage 1
(WGCNA/TOM).
set.seed(11) # For reproducibility
n_subjects <- 60; n_timepoints <- 2; p <- 20
# Build X (two time points) with matching colnames
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)
# Long view and IDs
X_long <- do.call(rbind, X)
id <- rep(seq_len(n_subjects), each = n_timepoints)
time <- rep(seq_len(n_timepoints), times = n_subjects)
# Strong signal on V1, V2, V3 + modest subject random effect + small noise
u_subj <- rnorm(n_subjects, 0, 0.7)
eps <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[, "V1"] + 3.5*X_long[, "V2"] + 3.2*X_long[, "V3"] +
rep(u_subj, each = n_timepoints) + eps
# Lightweight dissimilarity to skip Stage 1 (fast on CRAN)
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))We call the main function, passing our precomputed
dissimilarity_matrix = A and asking for 3 features.
# Run TemporalForest with minimal settings for vignette
tf_result <- temporal_forest(
X = X, Y = Y, id = id, time = time,
dissimilarity_matrix = A, # skip WGCNA/TOM (Stage 1)
n_features_to_select = 3,
n_boot_screen = 4, # Very low for quick demo
n_boot_select =8, # Very low for quick demo
keep_fraction_screen = 1, # Permissive screening
min_module_size = 2,
alpha_screen = 0.5, # Permissive screening
alpha_select = 0.6
)
#> ..cutHeight not given, setting it to 0.951 ===> 99% of the (truncated) height range in dendro.
#> ..done.Examine the selected features and check if the true predictors were found.
print(tf_result)
#> --- Temporal Forest Results ---
#>
#> Top 3 feature(s) selected:
#> V1
#> V3
#> V2
#>
#> 5 feature(s) were candidates in the final stage.# Validate against ground truth
true_predictors <- c("V1", "V2", "V3")
cat("True predictors found:", sum(true_predictors %in% tf_result$top_features),
"out of", length(true_predictors), "\n")
#> True predictors found: 3 out of 3The algorithm successfully identified all three true predictors in this high signal-to-noise example.
TemporalForest operates in three stages:
n_features_to_select: Final number of features to
return (default: 10)n_boot_screen, n_boot_select: Number of
bootstrap samples for screening and selection stages. Increase for more
stable results (defaults: 50, 100).keep_fraction_screen: Proportion of features from each
module passed to final selection (default: 0.25). Increase if too few
features are selected.min_module_size: Minimum size for network modules
(default: 4).alpha_screen, alpha_select: Significance
levels for splitting in screening and selection trees (defaults: 0.2,
0.05).| Symptom | Likely Cause | Solution |
|---|---|---|
| No features selected | Screening too strict | Increase keep_fraction_screen or
alpha_screen |
| Too many features selected | Selection too liberal | Decrease keep_fraction_screen or
alpha_select |
| Long computation time | Data too large | Reduce bootstrap numbers or pre-filter features |
The package includes checks for proper data formatting. Here’s an example of the error message for inconsistent inputs:
# This will produce a clear error message
mat1 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "B")))
mat2 <- matrix(1:4, nrow=2, dimnames=list(NULL, c("A", "C")))
bad_X <- list(mat1, mat2)
TemporalForest::check_temporal_consistency(bad_X)
#> Error: Inconsistent data format: The column names of the matrix for time point 2 do not match the column names of the first time point.TemporalForest provides an end-to-end solution for reproducible
feature selection in longitudinal high-dimensional data. For detailed
information on all function parameters and advanced usage, see the
package documentation (?TemporalForest).
To cite TemporalForest in publications, please use:
citation("TemporalForest")
#> To cite package 'TemporalForest' in publications use:
#>
#> Shao S, Moore J, Ramirez C (2025). _TemporalForest: A package for
#> reproducible feature selection in high-dimensional longitudinal
#> data_. R package version 0.1.0,
#> <https://github.com/SisiShao/TemporalForest>.
#>
#> Shao S, Moore J, Ramirez C (2025). "Network-Guided TemporalForest for
#> Feature Selection in High-Dimensional Longitudinal Data." Manuscript
#> submitted for publication.,
#> <https://github.com/SisiShao/TemporalForest>.
#>
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/Los_Angeles
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] TemporalForest_0.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] Rdpack_2.6.4 DBI_1.2.3 gridExtra_2.3
#> [4] rlang_1.1.6 magrittr_2.0.4 matrixStats_1.5.0
#> [7] compiler_4.4.1 RSQLite_2.4.3 png_0.1-8
#> [10] vctrs_0.6.5 stringr_1.5.2 pkgconfig_2.0.3
#> [13] crayon_1.5.3 fastmap_1.2.0 backports_1.5.0
#> [16] XVector_0.44.0 inum_1.0-5 rmarkdown_2.30
#> [19] UCSC.utils_1.0.0 nloptr_2.2.1 preprocessCore_1.66.0
#> [22] bit_4.6.0 xfun_0.53 zlibbioc_1.50.0
#> [25] cachem_1.1.0 flashClust_1.01-2 GenomeInfoDb_1.40.1
#> [28] jsonlite_2.0.0 blob_1.2.4 parallel_4.4.1
#> [31] cluster_2.1.8.1 R6_2.6.1 glmertree_0.2-6
#> [34] bslib_0.9.0 stringi_1.8.7 RColorBrewer_1.1-3
#> [37] boot_1.3-32 rpart_4.1.24 jquerylib_0.1.4
#> [40] Rcpp_1.1.0 iterators_1.0.14 knitr_1.50
#> [43] WGCNA_1.73 base64enc_0.1-3 IRanges_2.38.1
#> [46] Matrix_1.7-4 splines_4.4.1 nnet_7.3-20
#> [49] tidyselect_1.2.1 rstudioapi_0.17.1 yaml_2.3.10
#> [52] partykit_1.2-24 doParallel_1.0.17 codetools_0.2-20
#> [55] lattice_0.22-7 tibble_3.3.0 Biobase_2.64.0
#> [58] KEGGREST_1.44.1 S7_0.2.0 evaluate_1.0.5
#> [61] foreign_0.8-90 survival_3.8-3 Biostrings_2.72.1
#> [64] pillar_1.11.1 checkmate_2.3.3 foreach_1.5.2
#> [67] stats4_4.4.1 reformulas_0.4.1 generics_0.1.4
#> [70] S4Vectors_0.42.1 ggplot2_4.0.0 scales_1.4.0
#> [73] minqa_1.2.8 glue_1.8.0 Hmisc_5.2-4
#> [76] tools_4.4.1 data.table_1.17.8 lme4_1.1-37
#> [79] mvtnorm_1.3-3 fastcluster_1.3.0 grid_4.4.1
#> [82] impute_1.78.0 libcoin_1.0-10 rbibutils_2.3
#> [85] AnnotationDbi_1.66.0 colorspace_2.1-2 nlme_3.1-168
#> [88] GenomeInfoDbData_1.2.12 htmlTable_2.4.3 Formula_1.2-5
#> [91] cli_3.6.5 dplyr_1.1.4 gtable_0.3.6
#> [94] dynamicTreeCut_1.63-1 sass_0.4.10 digest_0.6.37
#> [97] BiocGenerics_0.50.0 htmlwidgets_1.6.4 farver_2.1.2
#> [100] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
#> [103] httr_1.4.7 GO.db_3.19.1 bit64_4.6.0-1
#> [106] MASS_7.3-65
options(old_ops)