| Type: | Package |
| Title: | Self-Organizing Maps for Mixed-Attribute Data Using Gower Distance |
| Version: | 0.1.0 |
| Description: | Implements a variant of the Self-Organizing Map (SOM) algorithm designed for mixed-attribute datasets. Similarity between observations is computed using the Gower distance, and categorical prototypes are updated via heuristic strategies (weighted mode and multinomial sampling). Provides functions for model fitting, mapping, visualization (U-Matrix and component planes), and evaluation, making SOM applicable to heterogeneous real-world data. For methodological details see Sáez and Salas (2026) <doi:10.1007/s41060-025-00941-6>. |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.3.0) |
| Imports: | StatMatch, dplyr, gower, ggplot2, cluster, reshape2, grid, utils, stats,cli |
| Suggests: | knitr, rmarkdown |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| Maintainer: | Patricio Salas <patricioasalas@udec.cl> |
| Packaged: | 2026-01-22 11:44:32 UTC; Patricio Salas |
| Author: | Patricio Salas |
| Repository: | CRAN |
| Date/Publication: | 2026-01-27 08:50:02 UTC |
Map observations to BMUs (Best Matching Units) using Gower distance
Description
Computes, for each observation, the index of the best-matching neuron (BMU) in a trained Gower-SOM codebook and the corresponding Gower distance. Also converts BMU indices into grid coordinates (row, col).
Usage
get_bmu_gower(data, codebook, n_rows, n_cols)
Arguments
data |
A |
codebook |
A |
n_rows, n_cols |
Integers, the SOM grid dimensions. |
Value
A data.frame with the following columns:
- bmu
Integer BMU index (1 .. n_rows * n_cols).
- distance
Numeric, the Gower distance to the BMU.
- row
Integer, BMU grid row coordinate.
- col
Integer, BMU grid column coordinate.
Author(s)
Patricio Sáez <patricsaez@udec.cl>; Patricio Salas <patricioasalas@udec.cl>
References
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
See Also
Examples
set.seed(1)
df <- data.frame(
x1 = rnorm(10),
x2 = rnorm(10),
g = factor(sample(letters[1:3], 10, TRUE))
)
fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3,
num_iterations = 5, batch_size = 5)
res <- get_bmu_gower(df, codebook = fit$weights,
n_rows = 3, n_cols = 3)
head(res)
Train a Gower-SOM on mixed-attribute data
Description
Train a Self-Organizing Map (SOM) on datasets with mixed attributes (numeric and categorical) using Gower distance to find the BMU and heuristics to update categorical prototypes.
Usage
gsom_Training(data, grid_rows = 5, grid_cols = 5,
learning_rate = 0.1, num_iterations = 100,
radius = NULL, batch_size = 10,
sampling = TRUE, set_seed = 123)
Arguments
data |
|
grid_rows, grid_cols |
SOM grid dimensions (rows x cols). |
learning_rate |
Initial learning rate (decays exponentially). |
num_iterations |
Number of iterations. |
radius |
Initial neighborhood radius; default |
batch_size |
Mini-batch size per iteration. |
sampling |
Logical; if |
set_seed |
Integer random seed for reproducibility. |
Details
Learning rate and neighborhood radius decay exponentially per iteration:
\alpha_t = \alpha_0 \exp(-t/T), \quad
r_t = r_0 \exp(-t/(T/\log r_0))
where T is num_iterations and r_0 is radius
(default max(grid_rows, grid_cols)/2). For categorical variables,
the prototype combines current and input values weighted by \alpha_t
and the neighborhood kernel; if sampling = TRUE, a weighted draw
is used; otherwise a weighted mode is applied.
Value
An object of class gowersom with:
-
weights: data.frame of trained neuron prototypes. -
coords: data.frame of grid coordinates per neuron.
Author(s)
Patricio Sáez <patricsaez@udec.cl>; Patricio Salas <patricioasalas@udec.cl>
References
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
Examples
set.seed(1)
df <- data.frame(
x1 = rnorm(50),
x2 = rnorm(50),
g = factor(sample(letters[1:3], 50, TRUE))
)
fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3,
learning_rate = 0.1, num_iterations = 10,
batch_size = 8, sampling = TRUE, set_seed = 123)
str(fit)
Compute the U-Matrix for a trained Gower-SOM
Description
Calculates the U-Matrix (unified distance matrix) to visualize the topology and cluster structure of a Self-Organizing Map trained on mixed-attribute data. Each entry contains the average Gower distance between a neuron and its immediate neighbors in the rectangular grid.
Usage
gsom_Umatrix(codebook, n_rows, n_cols)
Arguments
codebook |
A data.frame or matrix containing the SOM prototypes (weights), with one row per neuron. |
n_rows |
Integer, number of rows in the SOM grid. |
n_cols |
Integer, number of columns in the SOM grid. |
Details
The function assumes a rectangular topology where each neuron has up to
four direct neighbors (up, down, left, right). For each neuron, the mean
Gower distance to its valid neighbors is computed using
daisy with metric = "gower".
Value
A numeric matrix of size n_rows x n_cols, where each cell contains
the average distance between the corresponding neuron and its neighbors.
Author(s)
Patricio Sáez <patricsaez@udec.cl>; Patricio Salas <patricioasalas@udec.cl>
References
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
See Also
Examples
set.seed(1)
df <- data.frame(
x1 = rnorm(20),
x2 = rnorm(20),
g = factor(sample(letters[1:3], 20, TRUE))
)
fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3,
num_iterations = 5, batch_size = 4)
U <- gsom_Umatrix(fit$weights, n_rows = 3, n_cols = 3)
plot_Umatrix(U)
Predict BMUs for new data using a fitted Gower-SOM
Description
Maps new observations to their Best Matching Units (BMUs) using the
codebook and grid stored in a fitted gowersom object.
Usage
gsom_predict(object, newdata, ...)
Arguments
object |
A |
newdata |
A |
... |
Additional arguments passed to internal functions (not used). |
Details
This function is a convenience wrapper around get_bmu_gower.
It automatically extracts the grid dimensions from object\$coords
and applies BMU mapping for each observation in newdata.
Value
A data.frame with the following columns:
- bmu
Integer BMU index (1 .. n_rows * n_cols).
- distance
Numeric Gower distance to the BMU.
- row
Integer, BMU grid row coordinate.
- col
Integer, BMU grid column coordinate.
Author(s)
Patricio Sáez <patricsaez@udec.cl>; Patricio Salas <patricioasalas@udec.cl>
References
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
See Also
Examples
set.seed(1)
df <- data.frame(
x1 = rnorm(20),
x2 = rnorm(20),
g = factor(sample(letters[1:3], 20, TRUE))
)
fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3,
num_iterations = 5, batch_size = 4)
# Map observations to BMUs
pred <- gsom_predict(fit, df)
head(pred)
Update categorical prototype in Gower-SOM (internal)
Description
Updates the categorical prototype of a neuron given candidate factor levels and associated weights.
Usage
gsom_updateCategorical(values, weights, sampling = FALSE)
Arguments
values |
A factor vector of candidate categories. |
weights |
A numeric vector of weights, same length as |
sampling |
Logical; if |
Details
If sampling = FALSE, the function returns the weighted mode
(i.e., the most probable level according to weights).
If sampling = TRUE, it samples one level with probability
proportional to the normalized weights, introducing stochasticity.
Value
A factor of length 1 with the chosen level.
Author(s)
Patricio Sáez <patricsaez@udec.cl>; Patricio Salas <patricioasalas@udec.cl>
References
Sáez, P., Salas, P. Gower-SOM: a self-organizing map for mixed data with gower distance and heuristic adaptation for data analytics. Int J Data Sci Anal 22, 26 (2026). https://doi.org/10.1007/s41060-025-00941-6/."
Examples
vals <- factor(c("A","A","B","C"))
wts <- c(0.2, 0.5, 0.2, 0.1)
# Deterministic update (weighted mode)
gsom_updateCategorical(vals, wts, sampling = FALSE)
# Stochastic update (weighted sampling)
gsom_updateCategorical(vals, wts, sampling = TRUE)
Plot the U-Matrix of a Gower-SOM
Description
Visualizes the U-Matrix of a trained Gower-SOM using ggplot2. The U-Matrix reveals cluster boundaries and topological structures in the map.
Usage
plot_Umatrix(u_matrix, fill_palette = "C")
Arguments
u_matrix |
Numeric matrix as returned by |
fill_palette |
Character string, viridis option for the fill scale
(default |
Details
The function reshapes the U-Matrix into long format and draws a raster heatmap
with geom_raster. By default, it uses perceptually uniform viridis
palettes for improved interpretability, but the palette can be changed through
fill_palette.
Value
A ggplot object displaying the U-Matrix as a heatmap.
See Also
Examples
set.seed(1)
df <- data.frame(
x1 = rnorm(20),
x2 = rnorm(20),
g = factor(sample(letters[1:3], 20, TRUE))
)
fit <- gsom_Training(df, grid_rows = 3, grid_cols = 3,
num_iterations = 5, batch_size = 4)
U <- gsom_Umatrix(fit$weights, n_rows = 3, n_cols = 3)
plot_Umatrix(U)