The PoSIAdjRSquared package allows users to calculate p-values and confidence intervals for regression coefficients after they have been selected by adjusted R squared in linear models. The p-values and confidence intervals are valid after model selection with the same data. This allows the user to use all data for both model selection and inference without losing control over the type I error rate. The provided tests are more powerful than data splitting, which bases inference on less data since it discards all information used for selection.
You can install the PoSIAdjRSquared package directly in R from GitHub with:
install.packages("devtools")
library(devtools)
install_github("pirennesarah/PoSIAdjRSquared")
This is a basic example which shows you how to calculate post-selection p-values and confidence intervals for some generated data. The code is similarly applicable to real data.
library(PoSIAdjRSquared)
# Generate data
<- 100
n <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
Data <- Data$X
X <- Data$y
y
# Select model
<- fit_all_subset_linear_models(y, X, intercept=FALSE)
result <- result$phat
phat <- result$X_M_phat
X_M_phat <- result$k
k <- result$R_M_phat
R_M_phat <- result$kappa_M_phat
kappa_M_phat <- result$R_M_k
R_M_k <- result$kappa_M_k
kappa_M_k
# Estimate Sigma from residuals of full model
<- lm(y ~ 0 + X)
full_model <- sd(resid(full_model))
sigma_hat <- diag(n)*(sigma_hat)^2
Sigma
# Construct test statistic
<- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
Construct_test <- Construct_test$a
a <- Construct_test$b
b <- Construct_test$etaj
etaj <- Construct_test$etajTy
etajTy
# Solve selection event
<- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
Solve <- Solve$z_interval
z_interval
# Post-selection p-value for beta_j=0
<- sqrt((t(etaj)%*%Sigma)%*%etaj)
tn_sigma postselp_value_specified_interval(z_interval, etaj, etajTy, tn_mu = 0, tn_sigma)
#> [1] 0.8410427
# Post-selection (1-alpha)% confidence interval
compute_ci_with_specified_interval(z_interval, etaj, etajTy, Sigma, tn_mu = 0, alpha = 0.05)
#> [1] -0.2394537 0.1111173
Pirenne, S. and Claeskens, G. (2024). Exact post-selection inference for adjusted R squared selection. Statistics & Probability Letters, 211(110133):1-9. https://doi.org/10.1016/j.spl.2024.110133