Angle k-means: Accelerating Exact k-means via Angular Relationships¶

Conference: ICLR 2026
OpenReview: https://openreview.net/forum?id=BdhQWT0y9s
Code: C/C++ implementation + Python interface provided in supplemental material
Area: Clustering / Optimization Algorithms / Accelerating Exact k-means
Keywords: k-means acceleration, exact clustering, geometric pruning, angular relationships, parameter-free

TL;DR¶

This paper proposes Angle k-means, which accelerates the assignment step by precomputing distances and angles between centroids. By utilizing a geometric inequality involving only angular comparisons to prune distant candidate centroids, it runs faster than SOTA methods like Ball k-means and Exp-ns without introducing any hyperparameters or altering the final clustering results.

Background & Motivation¶

Background: Lloyd's k-means remains a pillar of clustering due to its simplicity, but the "assignment" step requires calculating the distance from every sample to all $k$ centroids, resulting in $O(nkD)$ complexity, which is expensive for large-scale, high-dimensional data. Mainstream acceleration for exact k-means follows two paths: tree structures (kd-tree, cover tree, dual-tree) and boundary-based methods (Elkan, Hamerly, Annulus, Exponion/Exp-ns, Ball k-means).
Limitations of Prior Work: (1) Tree structures suffer from the "curse of dimensionality" in high-dimensional spaces; (2) Elkan's algorithm requires $O(nk)$ memory to maintain $k$ lower bounds per sample, which explodes with the number of clusters; (3) Methods like Ball k-means, Yinyang, and Adaptive k-means, while fast, introduce additional hyperparameters (e.g., number of neighbors, number of lower bound groups), making tuning difficult and sacrificing the "out-of-the-box" simplicity of k-means.
Key Challenge: Achieving more aggressive pruning (higher speed) typically involves more complex auxiliary structures and hyperparameters. Balancing effective pruning with zero new hyperparameters is the core contradiction.
Goal: Design an acceleration algorithm that is identical in usage to standard k-means (no hyperparameters), yields identical clustering results (exact), and significantly outperforms others in distance computations and runtime.
Key Insight: [Geometric Pruning] Conventional boundary methods rely only on "inter-centroid distances" (triangle inequality). This paper introduces the "angle of samples/centroids relative to the origin," an almost free geometric quantity, to further compress candidate centroids into a small region defined by an angular criterion.

Method¶

Overall Architecture¶

Angle k-means follows the Lloyd iteration of "update centroids → re-assign." The only modification is in the assignment step: each round first $O(k^2 D)$ precomputes the inter-centroid distance matrix $D$, distances from centroids to the origin $v$, and an angle matrix $\Theta$. For each sample $x_i$ (currently in cluster $g$ with radius $r_i=\|x_i-c_g\|_2$), an angular-only comparison criterion is used to filter the candidate centroid set $B_i$. Exact distance comparisons are only performed within $B_i$. This filtering is theoretically guaranteed: any pruned centroid is guaranteed to have a distance to $x_i \ge r_i$, ensuring results are identical to brute-force Lloyd.

flowchart LR
    A[Randomly initialize labels y] --> B[Update centroids C]
    B --> C["Precompute geometric quantities<br/>D: Centroid distances<br/>v: Origin distances<br/>Θ: Angle matrix<br/>β,r: Sample angle/radius"]
    C --> D["For each sample xi<br/>Traverse centroids by proximity to cg"]
    D --> E{"Angular criterion<br/>cos(βi-θgl) > dgl/(2ri) ?"}
    E -->|Yes| F[Add to candidate set Bi]
    E -->|No| G[Prune, skip distance calc]
    F --> H["Update yi with nearest<br/>centroid in Bi"]
    H --> I{Converged?}
    I -->|No| B
    I -->|Yes| J[Output labels]

Key Designs¶

1. Angle-Distance Lower Bound Theorem: Reducing a "Two-Point Distance" problem to 1D angles. The theoretical foundation is Theorem 2: Given reference points $a, b$, if the four distances from points $p, q$ to $a, b$ are fixed, the minimum possible distance between $p, q$ has a closed-form solution $d(p,q)=\sqrt{e_2^2+e_4^2-2e_2e_4\cos(\theta_1-\theta_2)}$, where $\theta_1, \theta_2$ are angles relative to the baseline $\overrightarrow{ab}$. The authors instantiate this for clustering: setting $a$ to the origin $o$ and $b$ to the current centroid $c_g$, the lower bound of the squared distance from sample $x_i$ to any centroid $c_j$ is: $$\|x_i-c_j\|_2^2 = r_i^2 + \|c_g-c_j\|_2^2 - 2r_i\|c_g-c_j\|_2\cos(\beta_i-\theta_{gj})$$ Here $\beta_i=\angle o c_g x_i$ and $\theta_{gj}=\angle o c_g c_j$. Note that $\beta_i-\theta_{gj}$ is not $\angle x_i c_g c_j$; this formula derives from the theorem rather than the simple Law of Cosines, allowing sample-side angles $\beta_i$ and centroid-side angles $\theta_{gj}$ to be precomputed independently.

2. Two special cases provide pruning intuition. Before the formal criterion, the authors use two verifiable cases: (Case 1) If $\|c_g-c_j\|\ge r_i$ and $|\beta_i-\theta_{gj}|\ge\pi/3$, then $\|x_i-c_j\|\ge r_i$; (Case 2) If $\|c_g-c_j\|\le r_i$ and $|\beta_i-\theta_{gj}|\ge\pi/2$, the same holds. Intuition: When the angular difference is large enough (the centroid deviates from the sample's direction), the negative contribution of $\cos(\beta_i-\theta_{gj})$ cannot push the lower bound below $r_i$.

3. Unified Angular Criterion (Core Pruning Rule). Generalizing these cases into a continuous threshold gives Corollary 2.1: Let $t=\frac{\|c_g-c_j\|_2}{2r_i}$. If $\cos(\beta_i-\theta_{gj})\le t$, then $\|x_i-c_j\|_2\ge r_i$. Conversely, the candidate set $B_i$ is defined as: $$B_i=\Big\{\, l \;\Big|\; \cos(\beta_i-\theta_{gl}) > \frac{d_{gl}}{2r_i},\; l=1,\dots,k \,\Big\}$$ Only centroids whose cosine exceeds this distance ratio "might" be closer than the current one and merit a real distance calculation. This couples distance and angle into a single scalar comparison, providing tighter pruning than distance-only methods like Ball k-means.

4. Early termination via distance sorting. In implementation (Algorithm 1), the algorithm avoids scanning all $k$ centroids for $x_i$. Instead, it pre-sorts each row of the distance matrix $D$ to get index matrix $H$ (neighbors of $c_g$ by proximity). It visits centroids in increasing order of $d_{gl}$: once $d_{gl} \ge 2r_i$, it Breaks—by the triangle inequality, if $\|x_i-c_j\| < r_i$, then $\|c_g-c_j\| < 2r_i$ must hold. This reduces the candidate traversal from $O(k)$ to a few neighbors on average. The single-round complexity is $O(k^2 D\log k + npD + nk)$ with $O(k^2)$ space, where $p$ is the average candidates per sample. Angles are nearly free—$\Theta$ is derived from precalculated distances via the Law of Cosines.

Key Experimental Results¶

Main Results¶

Datasets: 14 real-world datasets, 10 medium-scale (Digits, Corel5k, Coil100, CNBC, Isolet, RaFD, USPS, PINS, CPLFW, EYaleB) + 4 large-scale (L-CAS, L-CLBA, L-YTF, L-EDS). $N$ from 4,000 to 621,126, $D$ from 256 to 1,024, $k$ from 10 to 10,177.
Baselines (all are parameter-free exact methods): Lloyd, Elkan, Annulus, Exp-ns, Ball k-means (ring/no-ring). Metrics: Average distance computations and runtime per round.

Dataset (k)	Dist. Computations (% of Lloyd)	Runtime (% of Lloyd)
RaFD (k=200)	10.15%	32.13%
EYaleB (k=200)	6.7%	39.28%
L-YTF (k=2000)	15.5%	38.39%
L-CLBA (k=10000)	14.81%	52.16%

Complexity Comparison¶

Algorithm	Setup	Time	Space
Lloyd	$O(1)$	$O(nkD)$	$O(k+n)$
Elkan	$O(1)$	$O(k^2D+knD)$	$O(k^2+kn)$
Exp-ns	$O(1)$	$O(k^2D\log k+nD\log\log k+k^2D+knD)$	$O(k^2+kn)$
Ball k-means	$O(k^2D+knD)$	$O(k^2D+kmD\log m+mn'D+nD)$	$O(k^2+kn)$
Angle k-means	$O(nD)$	$O(k^2D\log k+npD+nk)$	$O(k^2)$

Key Findings¶

In most datasets, Angle k-means uses the fewest distance computations. Since distance calculation dominates runtime, it is also usually the fastest.
Advantage increases with $k$: On datasets with large $k$ (Corel5k, PINS, L-CLBA, L-CAS), the downward trend in runtime and computations is more significant, fitting real-world large-scale clustering scenarios.
Space complexity is only $O(k^2)$, significantly better than $O(k^2+kn)$ methods like Elkan/Ball, making it friendly for large $n$.

Highlights & Insights¶

"Free angles" is the cleverest aspect: The angle matrix $\Theta$ is derived directly from already computed centroid distances via the Law of Cosines, adding no extra distance calculations while slicing the candidate region from a "ball neighborhood" into an "angular wedge."
Zero new hyperparameters is a strong engineering selling point: Unlike Ball k-means, Yinyang, or Adaptive k-means that require tuning neighbor counts or lower bound groups, Angle k-means is used exactly like standard k-means.
Exact, not approximate: All pruning is theoretically guaranteed. It avoids the hidden trade-off between speed and accuracy.
Theorem 2 reduces the "minimum distance between two points under constraints" to a 1D closed-form solution. This lemma could potentially migrate to other nearest neighbor or retrieval problems.

Limitations & Future Work¶

The origin $o$ as a fixed reference is convenient but not necessarily the optimal anchor. Future work suggests using multiple reference points or dynamically selected anchors to further tighten filtering.
Precomputing $\Theta$ and sorting $D$ incurs an $O(k^2 D\log k)$ overhead per round. When $k$ is small and $n$ is not large, this overhead may offset the acceleration gains.
The experiments only compare against other "parameter-free" exact methods; it is unclear how it performs relative to hyperparameter-tuned SOTA (e.g., optimized Yinyang).
The angular criterion still relies on the radius $r_i$ to the current centroid; the inherent sensitivity of k-means to initialization is not addressed.

Boundary Method Lineage: Elkan ($k$ bounds, $O(nk)$ memory) → Hamerly (single bound, $O(n)$ space) → Annulus (pruning within rings using $\|\,\|x\|-\|c\|\,|\le\|x-c\|$) → Exponion/Exp-ns (ball neighborhoods + tighter bounds) → Ball k-means. Angle k-means represents the first systematic introduction of the "angular" dimension to this lineage, upgrading the "norm-difference" idea of Annulus with angles.
Insight: In any task requiring repeated distance calculations between samples and anchors (ANN search, vector retrieval, sparse attention), the paradigm of "precomputing anchor-anchor geometric relationships + 1D pruning criteria" is a low-cost, high-reward pattern.

Rating¶

Novelty: ⭐⭐⭐⭐ — Introduces "angular relationships" as a free secondary pruning criterion in a mature field.
Experimental Thoroughness: ⭐⭐⭐⭐ — Covers 14 datasets across scales and $k$ up to 10,000.
Writing Quality: ⭐⭐⭐⭐ — Clear progression from theorem to algorithm; the 3-panel figure provides excellent intuition.
Value: ⭐⭐⭐⭐ — Parameter-free, exact, $O(k^2)$ space, significant acceleration for large $k$; highly practical for engineering.

Algorithm	Setup	Time	Space
Lloyd	\(O(1)\)	\(O(nkD)\)	\(O(k+n)\)
Elkan	\(O(1)\)	\(O(k^2D+knD)\)	\(O(k^2+kn)\)
Exp-ns	\(O(1)\)	\(O(k^2D\log k+nD\log\log k+k^2D+knD)\)	\(O(k^2+kn)\)
Ball k-means	\(O(k^2D+knD)\)	\(O(k^2D+kmD\log m+mn'D+nD)\)	\(O(k^2+kn)\)
Angle k-means	\(O(nD)\)	\(O(k^2D\log k+npD+nk)\)	\(O(k^2)\)