Same results as k-means. A fraction of the distance computations, memory, and energy.
Geometric-k-means (Gk-means) accelerates the classic k-means algorithm by treating data as a first-class citizen. Instead of recomputing every point against every centroid each iteration, it uses simple geometry — scalar projection — to focus computation only on the points that can actually change cluster membership (the high-expressive, HE, points), and skips the rest (low-expressive, LE).
The payoff, established in the paper and reproducible from this repository:
This repository provides both a header-only C++
library and an R package
(geokmeans) that wraps it, so you get C++ speed from a
one-line R call.
A point only needs a distance computation if it is oriented toward a neighboring centroid (HE). Points oriented back toward their own centroid (LE/LHE) keep their membership and are skipped.
Only C and D are high-expressive (HE) — the angle from
the midpoint toward the neighbor centroid is < 90°. A and B point
back toward their own centroid and are treated as LE. Figure from
Sharma et al. (2026), Machine Learning (open access).
geokmeansSeven fast k-means variants behind one uniform interface —
all the heavy lifting runs in C++ via Rcpp and RcppEigen.
# From CRAN (once published)
install.packages("geokmeans")
# Development version from this repository (Option A: package lives in a subdirectory)
# install.packages("remotes")
remotes::install_github("parichit/Geometric-k-means", subdir = "geokmeans")A C++17 compiler is required (the standard R toolchains on Windows, macOS, and Linux qualify).
library(geokmeans)
set.seed(1)
X <- rbind(matrix(rnorm(200, mean = 0), ncol = 2),
matrix(rnorm(200, mean = 6), ncol = 2))
# Geometric-k-means
fit <- geo_kmeans(X, centers = 2)
fit # iterations, distance computations, centroids
fit$cluster # per-point cluster labels
fit$centroids # final cluster centres
# Pick any algorithm by name
kmeans_dc(X, centers = 2, method = "elkan")
# Supply your own starting centroids (like stats::kmeans)
geo_kmeans(X, centers = X[c(1, 101), ])Every function shares the same signature:
geo_kmeans(data, centers,
iter_max = 100, threshold = 1e-3,
init = c("random", "sequential"),
seed = NULL, with_labels = TRUE,
drop_empty = TRUE, verbose = FALSE)centers is either the number of clusters k or a
matrix of initial centroids. Random initialization uses R’s RNG, so
results are reproducible with set.seed() or the
seed argument.
| Function | Algorithm | Type |
|---|---|---|
geo_kmeans() |
Geometric-k-means | bound-free |
ball_kmeans() |
Ball k-means++ | bound-free |
lloyd_kmeans() |
Lloyd’s k-means | baseline |
elkan_kmeans() |
Elkan | bounded |
hamerly_kmeans() |
Hamerly | bounded |
annulus_kmeans() |
Annulus | bounded |
exponion_kmeans() |
Exponion | bounded |
kmeans_dc() |
dispatcher — choose via method = |
— |
Two small datasets ship with the package:
bc <- as.matrix(read.csv(
system.file("extdata", "Breastcancer.csv", package = "geokmeans"),
header = FALSE))
geo_kmeans(bc, centers = 2)The algorithms are header-only and can be included directly in your
own C++ project (see the src/ directory). For example:
#include "geokmeans.h"
// output_data result = geokmeans(dataset, num_clusters, threshold,
// num_iterations, num_cols, "random", seed);Ball k-means additionally uses Eigen; the R package obtains
Eigen through RcppEigen.
Geometric-k-means/
├── src/ # header-only C++ implementations
├── data/ # datasets used in the paper
├── images/ # figures
└── geokmeans/ # the R package (CRAN)
If you use Geometric-k-means (the algorithm, the C++ code, or the R package) in your work, please cite the paper. It helps us keep contributing to the open-source community.
@article{Sharma2026Geometrickmeans,
title = {Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means},
author = {Sharma, Parichit and Malec, Marcin and Kurban, Hasan and Kulekci, Oguzhan and Dalkilic, Mehmet},
journal = {Machine Learning},
year = {2026},
volume = {115},
number = {2},
pages = {30},
doi = {10.1007/s10994-025-06891-1},
url = {https://doi.org/10.1007/s10994-025-06891-1}
}Sharma, P., Malec, M., Kurban, H., Kulekci, O., & Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. Machine Learning, 115(2), article 30. https://doi.org/10.1007/s10994-025-06891-1
Read the paper: Machine Learning (Springer, open access) ·
Preprint: arXiv:2508.06353
Released under the GPL-3 license.
Built at the Luddy School of Informatics, Computing & Engineering, Indiana University.