This version restores the package to CRAN after it was archived on
2023-10-29 due to a dependency on the optimr package, which
was removed from CRAN at the maintainer’s request. All calls to
optimr() now use optimx::optimr(), which is
the current home of that function.
optimr package has been removed from
Imports. Its function optimr() is now imported
from optimx, which absorbed it.optimr, optimx, and shapes
moved from Depends to Imports, in line with
CRAN policy.RSpectra moved from Suggests to
Imports, since it is unconditionally required for large
matrices in proj_LogBip().bootBLB(): fixed a column-naming bug that caused
plotBLB() to fail with “Column ‘Dim1’ not found” after
bootstrap aggregation.bootBLB(): suppressed verbose control messages printed
to the console by optimx::optimr() when using the CG
method.Sensitivy -> Sensitivity in
the confusion-matrix output of bootBLB() and
pred_LB().utils::globalVariables() declaration moved to
R/zzz.R to prevent a roxygen2 block error in
plotBLB.R.importFrom(stats, svd) and
importFrom(stats, eigen) from NAMESPACE: both functions
belong to base R.ggplot2::aes() calls updated to use
.data[[]] tidy evaluation, eliminating R CMD check NOTEs
about undefined global variables.inst/CITATION updated from the deprecated
citEntry() to bibentry().sdv_MM(): loop variable j initialised
before the loop to prevent a potential “object not found” error on zero
iterations.RoxygenNote updated to 7.3.3.Version 1.1.0 introduced a major new fitting method for the logistic biplot model, described in:
Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2022). A coordinate descent MM algorithm for logistic biplot model with missing data. In process.
All previous logistic biplot algorithms (alternating, external logistic, and the conjugate gradient / iterated SVD methods introduced in v1.0.0) share a structural limitation: each row of the data matrix has its own parameter vector theta_i = mu_i + sum_s a_is * b_s. Consequently:
The new method reformulates the logistic biplot using Pearson’s (1901) data projection idea, extended to the logistic case by Landgraf & Lee (2020). Instead of treating each row’s coordinates as independent free parameters, the row markers are expressed as a projection of the (centred) data matrix onto a low-rank subspace V:
A = (X - 1 * mu') * V
This single change has three important consequences:
The number of parameters no longer depends on n. Only the p x k matrix V (column markers) and the p-vector mu (intercepts) need to be estimated, regardless of how many rows the data matrix has.
New individuals can be projected without refitting. Given estimated V and mu, the row markers of any new observation x_new are simply: a_new = (x_new - mu’) * V. No optimisation is required.
Missing data are handled natively. A weight matrix W (W_ij = 1 if x_ij is observed, 0 if missing) is incorporated into the loss function. Missing entries are imputed at each iteration using the current fitted values and a per-variable threshold that minimises the Balanced Accuracy (BACC), ensuring that classification performance is optimised throughout fitting.
The objective function – the negative log-likelihood weighted by W – is non-convex. To avoid dealing with it directly, it is majorized at each iteration by a quadratic surrogate (the MM step), following the approach of Babativa-Marquez & Vicente-Villardon (2021). The surrogate is then minimised using a block coordinate descent algorithm:
RSpectra::eigs_sym()
for large matrices).Because each MM step reduces the surrogate, and the surrogate upper-bounds the true loss, the algorithm guarantees that the negative log-likelihood is non-increasing across iterations.
LogBip() – updatedThe main fitting function now accepts method = "PDLB"
(Projection-based logistic biplot with block coordinate Descent) in
addition to the existing "MM", "CG", and
"BFGS" methods.
When method = "PDLB":
x may contain NA
values.impute_x component: the
completed binary matrix with missing entries replaced by the model’s
fitted values."PDLB".# Complete data -- coordinate descent MM algorithm (fast, no missing values)
res_MM <- LogBip(x = Methylation, method = "MM", maxit = 1000)
# Matrix with missing data -- projection-based block coordinate descent
set.seed(12345)
n <- nrow(Methylation); p <- ncol(Methylation)
miss <- matrix(rbinom(n * p, 1, 0.2), n, p)
miss <- ifelse(miss == 1, NA, miss)
x_miss <- Methylation + miss
res_PDLB <- LogBip(x = x_miss, method = "PDLB", maxit = 1000)
imputed_data <- res_PDLB$impute_x # completed matrixproj_LogBip() – newLow-level function that implements the projection-based block
coordinate descent algorithm directly. It is called internally by
LogBip(method = "PDLB") but is also exported for advanced
users who need direct control over the algorithm.
out <- proj_LogBip(x = x_miss, k = 2, max_iters = 1000, epsilon = 1e-5)
# out$mu -- estimated intercept vector (length p)
# out$A -- row-marker matrix (n x k)
# out$B -- column-marker matrix (p x k)
# out$x_est -- imputed binary matrix
# out$iter -- number of iterations
# out$loss_funct -- loss function values per iterationcv_LogBip() – updatedCross-validation now supports method = "PDLB", allowing
selection of the optimal number of dimensions k for datasets with
missing values.
cv_result <- cv_LogBip(data = x_miss, k = 0:5, method = "PDLB", maxit = 1000)Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2021). Logistic biplot by conjugate gradient algorithms and iterated SVD. Mathematics, 9(16), 2015. https://doi.org/10.3390/math9162015
Landgraf, A. J., & Lee, Y. (2020). Dimensionality reduction for binary data through the projection of natural parameters. Journal of Multivariate Analysis, 180, 104668. https://doi.org/10.1016/j.jmva.2020.104668
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559-572.
Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503-521). Chapman & Hall.
First release of BiplotML, providing methods for fitting logistic biplot models to multivariate binary data.
LogBip() – fit a logistic biplot using conjugate
gradient ("CG") or BFGS ("BFGS") optimization
methods.sdv_MM() – fit a logistic biplot via the coordinate
descent Majorization-Minimization algorithm (method = "MM"
in LogBip()).bootBLB() – bootstrap logistic biplot with optional
confidence ellipses for row markers.plotBLB() – ggplot2-based biplot visualization with
directed variable vectors and optional confidence ellipses.pred_LB() – predict binary responses and compute
per-variable optimal classification thresholds minimising the Balanced
Error Rate.fitted_LB() – extract fitted values on the logit or
probability scale.cv_LogBip() – k-fold cross-validation for selecting the
number of dimensions.performanceBLB() – compare convergence speed and
accuracy across multiple optimization algorithms.gradientDesc() – fit a logistic biplot via simple
gradient descent (pedagogical / benchmarking use).simBin() – simulate a binary data matrix from a latent
variable model for benchmarking and cross-validation studies.Methylation – a binary matrix of DNA methylation data
(50 individuals, 13 CpG sites) used in examples throughout the
package.Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2021). Logistic biplot by conjugate gradient algorithms and iterated SVD. Mathematics, 9(16), 2015. https://doi.org/10.3390/math9162015