SpEx: A Spectral Approach to Explainable Clustering¶
Conference: NeurIPS 2025 arXiv: 2511.00885 Code: GitHub Area: Explainable Clustering, Spectral Methods, Graph Partitioning Keywords: Explainable clustering, spectral graph partitioning, coordinate cuts, decision trees, Cheeger inequality
TL;DR¶
This paper proposes SpEx, a general spectral graph partitioning-based framework for explainable clustering that can "round" any reference clustering (without requiring centroids) into an explainable clustering via coordinate-cut decision trees, or perform reference-free clustering directly on a kNN graph.
Background & Motivation¶
- Explainable clustering (Moshkovitz 2020): clustering described by binary decision trees where each internal node corresponds to a single-coordinate threshold cut.
- Limitations of prior work:
- IMM/EMN: require the reference clustering to have centroids (k-means); cannot handle kernel k-means or spectral clustering.
- Kernel IMM: supports only specific kernel functions, with high complexity and poor scalability.
- CART: not designed for clustering; maximizes Gini impurity reduction, potentially selecting cuts incompatible with the clustering structure.
- No truly general explainable clustering method exists to date — one that is unconstrained by reference clustering type and robust on real-world data.
Method¶
Overall Architecture¶
Leverages a generalization of the Cheeger inequality to relate coordinate cuts to the conductance of graph cuts.
Key Designs¶
Core Theorem (Theorem 2.2)¶
For a point set \(X \subset \mathbb{R}^d\) and graph \(G(X, E, w)\), there exists a coordinate cut \(S_{j,\tau}\) such that: $\(\Psi_G(S_{j,\tau}(X)) \leq \sqrt{\frac{\mathbb{E}_{adj}\|x-y\|_2^2}{\mathbb{E}_{all}\|x-y\|_2^2}}\)$ - Numerator: expected squared distance between adjacent pairs in the graph. - Denominator: expected squared distance between all pairs. - When the graph captures meaningful geometric structure (neighbors are closer than random pairs), the conductance upper bound is tighter.
SpEx-Clique (Reference Clustering Mode)¶
- Given any reference clustering, constructs a clique graph: edges connect points within the same cluster.
- Iteratively selects coordinate cuts with minimum conductance to build a decision tree.
- Requires neither centroids nor specific kernel functions.
SpEx-kNN (Reference-Free Mode)¶
- Constructs a kNN graph (\(k=20\)) directly on the dataset.
- Produces explainable clustering without any reference clustering.
Iterative Tree Construction (Algorithm 1)¶
- A priority queue maintains all leaf nodes.
- At each step, the leaf node with the greatest conductance reduction after cutting is selected for splitting.
- Objective: minimize the multi-way normalized cut objective \(\bar{\Psi}_G(T) = \sum_{v \in \mathcal{L}(T)}\psi_G(X_v)\).
- The number of leaf nodes is freely chosen and need not match the number of reference clusters.
Unified Theoretical Framework: Non-Uniform Sparse Cut¶
Via Trevisan's non-uniform sparse cut framework, prior methods are unified as follows: - IMM → \(G\) = star graph, \(H\) = diameter centroid-pair single-edge graph. - EMN → \(G\) = star graph, \(H\) = complete graph over centroids. - CART → \(H\) = independent set graph, \(G\) = \(H\)-degree-weighted complete graph.
Graph-theoretic explanation of CART's failure: The independent set graph incentivizes separating points from different clusters but does not penalize separating points within the same cluster — causing greedy selection of suboptimal cuts.
Computational Efficiency¶
- Per-level time: \(O(dn(\log n + S))\)
- SpEx-Clique: \(S = O(1)\) (cut scores updated by maintaining two cluster histograms)
- SpEx-kNN: \(S = O(\kappa)\), where \(\kappa\) is the number of neighbors
- Total time: \(O(kdn\log n)\) (SpEx-Clique)
Key Experimental Results¶
Main Results (8 Real-World Datasets, ARI/AMI Evaluation)¶
| Dataset | n | d | k | SpEx-Clique | SpEx-kNN | EMN | CART (k-means) | CART (spectral) |
|---|---|---|---|---|---|---|---|---|
| CIFAR-10 | 50K | 512 | 10 | Best/Best | Mid | Mid | Poor | Poor |
| MNIST | 60K | 512 | 10 | Best | Mid | Mid | Poor | Mid |
| 20News | 11K | 768 | 20 | Best | Mid | Mid | Poor | Poor |
| Beans | 13K | 16 | 7 | Mid | Best | Mid | Mid | Mid |
| Ecoli | 336 | 7 | 8 | Mid | Best | Mid | Mid | Mid |
| Iris | 150 | 4 | 3 | Mid | Best | Mid | Mid | Good |
Method Stability Analysis¶
| Property | SpEx-Clique | SpEx-kNN | EMN | CART | Kernel IMM |
|---|---|---|---|---|---|
| Requires centroids | ✗ | ✗ | ✓ | ✗ | ✗ |
| Requires reference clustering | ✓ | ✗ | ✓ | ✓ | ✓ |
| Scalability to large data | ✓ | ✓ | ✓ | ✓ | ✗ |
| Consistently strong performance | ✓ | Good on low-dim | Mid | Good on small data | Uncertain |
Key Findings¶
- SpEx-Clique is the most consistently high-performing algorithm, outperforming every baseline in most settings.
- SpEx-kNN performs best on low-dimensional datasets (≤16 dimensions).
- CART performs adequately on small datasets but degrades on large ones — confirming that toy-example failure modes manifest in real data.
- Kernel IMM cannot run on large datasets due to poor scalability.
- The number of leaf nodes in SpEx can be set flexibly — increasing leaves reduces clustering quality at the cost of interpretability.
Highlights & Insights¶
- Truly general: the first explainable clustering method compatible with arbitrary reference clustering types.
- Unified theoretical perspective: the non-uniform sparse cut framework provides a unified understanding of IMM, EMN, and CART.
- Root cause of CART's failure: the independent set graph does not penalize intra-cluster separation, leading to greedy selection of poor cuts.
- Reference-free mode: SpEx-kNN bypasses reference clustering entirely to produce explainable clusters directly from data.
- Efficient implementation: SpEx-Clique requires only \(O(1)\) time to update cut scores.
Limitations & Future Work¶
- Only axis-aligned (single-coordinate) cuts are considered — oblique cuts, though potentially more powerful, are unexplored.
- SpEx-kNN underperforms SpEx-Clique on high-dimensional data.
- Theoretical bounds on the explainability cost have not yet been analyzed.
- The value of \(k\) in the kNN graph must be specified by the user.
- Extension to online or streaming settings remains an open direction.
Related Work & Insights¶
- The explainable clustering formulation and IMM algorithm of Moshkovitz et al. (2020) have inspired a substantial body of theoretical work.
- EMN (2022) improves upon IMM via ratio-based optimization.
- Kernel IMM (2024) extends the framework to kernel k-means but with limited scalability.
- This paper is the first to introduce spectral graph partitioning theory into explainable clustering.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ (elegant combination of spectral methods and a unified framework)
- Technical Depth: ⭐⭐⭐⭐⭐ (theoretical contributions, algorithm design, and unified analysis)
- Experimental Thoroughness: ⭐⭐⭐⭐ (8 datasets, multiple baselines)
- Writing Quality: ⭐⭐⭐⭐⭐ (clear logic, well-balanced between intuition and formal proofs)