SpEx: A Spectral Approach to Explainable Clustering¶

Conference: NeurIPS 2025 arXiv: 2511.00885 Code: GitHub Area: Explainable Clustering, Spectral Methods, Graph Partitioning Keywords: Explainable clustering, spectral graph partitioning, coordinate cuts, decision trees, Cheeger inequality

TL;DR¶

This paper proposes SpEx, a general spectral graph partitioning-based framework for explainable clustering that can "round" any reference clustering (without requiring centroids) into an explainable clustering via coordinate-cut decision trees, or perform reference-free clustering directly on a kNN graph.

Background & Motivation¶

Explainable clustering (Moshkovitz 2020): clustering described by binary decision trees where each internal node corresponds to a single-coordinate threshold cut.
Limitations of prior work:
- IMM/EMN: require the reference clustering to have centroids (k-means); cannot handle kernel k-means or spectral clustering.
- Kernel IMM: supports only specific kernel functions, with high complexity and poor scalability.
- CART: not designed for clustering; maximizes Gini impurity reduction, potentially selecting cuts incompatible with the clustering structure.
No truly general explainable clustering method exists to date — one that is unconstrained by reference clustering type and robust on real-world data.

Method¶

Overall Architecture¶

Leverages a generalization of the Cheeger inequality to relate coordinate cuts to the conductance of graph cuts.

Key Designs¶

Core Theorem (Theorem 2.2)¶

For a point set $X \subset \mathbb{R}^d$ and graph $G(X, E, w)$, there exists a coordinate cut $S_{j,\tau}$ such that: $$\Psi_G(S_{j,\tau}(X)) \leq \sqrt{\frac{\mathbb{E}_{adj}\|x-y\|_2^2}{\mathbb{E}_{all}\|x-y\|_2^2}}$$ - Numerator: expected squared distance between adjacent pairs in the graph. - Denominator: expected squared distance between all pairs. - When the graph captures meaningful geometric structure (neighbors are closer than random pairs), the conductance upper bound is tighter.

SpEx-Clique (Reference Clustering Mode)¶

Given any reference clustering, constructs a clique graph: edges connect points within the same cluster.
Iteratively selects coordinate cuts with minimum conductance to build a decision tree.
Requires neither centroids nor specific kernel functions.

SpEx-kNN (Reference-Free Mode)¶

Constructs a kNN graph ($k=20$) directly on the dataset.
Produces explainable clustering without any reference clustering.

Iterative Tree Construction (Algorithm 1)¶

A priority queue maintains all leaf nodes.
At each step, the leaf node with the greatest conductance reduction after cutting is selected for splitting.
Objective: minimize the multi-way normalized cut objective $\bar{\Psi}_G(T) = \sum_{v \in \mathcal{L}(T)}\psi_G(X_v)$.
The number of leaf nodes is freely chosen and need not match the number of reference clusters.

Unified Theoretical Framework: Non-Uniform Sparse Cut¶

Via Trevisan's non-uniform sparse cut framework, prior methods are unified as follows: - IMM → $G$ = star graph, $H$ = diameter centroid-pair single-edge graph. - EMN → $G$ = star graph, $H$ = complete graph over centroids. - CART → $H$ = independent set graph, $G$ = $H$-degree-weighted complete graph.

Graph-theoretic explanation of CART's failure: The independent set graph incentivizes separating points from different clusters but does not penalize separating points within the same cluster — causing greedy selection of suboptimal cuts.

Computational Efficiency¶

Per-level time: $O(dn(\log n + S))$
SpEx-Clique: $S = O(1)$ (cut scores updated by maintaining two cluster histograms)
SpEx-kNN: $S = O(\kappa)$, where $\kappa$ is the number of neighbors
Total time: $O(kdn\log n)$ (SpEx-Clique)

Key Experimental Results¶

Main Results (8 Real-World Datasets, ARI/AMI Evaluation)¶

Dataset	n	d	k	SpEx-Clique	SpEx-kNN	EMN	CART (k-means)	CART (spectral)
CIFAR-10	50K	512	10	Best/Best	Mid	Mid	Poor	Poor
MNIST	60K	512	10	Best	Mid	Mid	Poor	Mid
20News	11K	768	20	Best	Mid	Mid	Poor	Poor
Beans	13K	16	7	Mid	Best	Mid	Mid	Mid
Ecoli	336	7	8	Mid	Best	Mid	Mid	Mid
Iris	150	4	3	Mid	Best	Mid	Mid	Good

Method Stability Analysis¶

Property	SpEx-Clique	SpEx-kNN	EMN	CART	Kernel IMM
Requires centroids	✗	✗	✓	✗	✗
Requires reference clustering	✓	✗	✓	✓	✓
Scalability to large data	✓	✓	✓	✓	✗
Consistently strong performance	✓	Good on low-dim	Mid	Good on small data	Uncertain

Key Findings¶

SpEx-Clique is the most consistently high-performing algorithm, outperforming every baseline in most settings.
SpEx-kNN performs best on low-dimensional datasets (≤16 dimensions).
CART performs adequately on small datasets but degrades on large ones — confirming that toy-example failure modes manifest in real data.
Kernel IMM cannot run on large datasets due to poor scalability.
The number of leaf nodes in SpEx can be set flexibly — increasing leaves reduces clustering quality at the cost of interpretability.

Highlights & Insights¶

Truly general: the first explainable clustering method compatible with arbitrary reference clustering types.
Unified theoretical perspective: the non-uniform sparse cut framework provides a unified understanding of IMM, EMN, and CART.
Root cause of CART's failure: the independent set graph does not penalize intra-cluster separation, leading to greedy selection of poor cuts.
Reference-free mode: SpEx-kNN bypasses reference clustering entirely to produce explainable clusters directly from data.
Efficient implementation: SpEx-Clique requires only $O(1)$ time to update cut scores.

Limitations & Future Work¶

Only axis-aligned (single-coordinate) cuts are considered — oblique cuts, though potentially more powerful, are unexplored.
SpEx-kNN underperforms SpEx-Clique on high-dimensional data.
Theoretical bounds on the explainability cost have not yet been analyzed.
The value of $k$ in the kNN graph must be specified by the user.
Extension to online or streaming settings remains an open direction.

The explainable clustering formulation and IMM algorithm of Moshkovitz et al. (2020) have inspired a substantial body of theoretical work.
EMN (2022) improves upon IMM via ratio-based optimization.
Kernel IMM (2024) extends the framework to kernel k-means but with limited scalability.
This paper is the first to introduce spectral graph partitioning theory into explainable clustering.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ (elegant combination of spectral methods and a unified framework)
Technical Depth: ⭐⭐⭐⭐⭐ (theoretical contributions, algorithm design, and unified analysis)
Experimental Thoroughness: ⭐⭐⭐⭐ (8 datasets, multiple baselines)
Writing Quality: ⭐⭐⭐⭐⭐ (clear logic, well-balanced between intuition and formal proofs)