Dynamic Algorithm for Explainable k-medians Clustering under lp Norm¶

Conference: NeurIPS 2025 arXiv: 2512.01150 Code: None Area: Interpretability Keywords: Explainable clustering, k-medians, decision trees, dynamic algorithms, approximation ratio

TL;DR¶

This paper presents the first explainable k-medians clustering algorithm for general \(\ell_p\) norms, achieving an approximation ratio of \(\tilde{O}(p(\log k)^{1+1/p-1/p^2})\) (improving the best known bound for \(p=2\)), along with the first dynamic variant: maintaining an explainable clustering under center insertions/deletions with \(O(d \log^3 k)\) amortized update time and \(O(\log k)\) amortized reassignments.

Background & Motivation¶

Background: Explainable clustering was introduced by Dasgupta et al. (2020), where clustering partitions are represented by threshold decision trees—each internal node applies a threshold comparison on a single feature, making cluster assignments transparent and traceable. The central question is the price of explainability: how well can the cost of a decision-tree clustering approximate the optimal unconstrained cost?

Limitations of Prior Work: The competitive ratio under the \(\ell_1\) norm has been tightly characterized as \(1 + H_{k-1}\); under \(\ell_2\), the best known bound is \(\tilde{O}(\log^{3/2} k)\), leaving room for improvement; no algorithm existed for general \(\ell_p\) (\(p > 2\)); and all prior algorithms are static.

Key Challenge: General \(\ell_p\) norms require more carefully designed random cut distributions—uniform random cuts yield a competitive ratio of \(\Theta(k^{1-1/p})\). In the dynamic setting, the hierarchical structure of decision trees makes local updates more challenging.

Goal: (a) Design explainable k-medians algorithms for all finite \(p \geq 1\) and improve the bound for \(p=2\); (b) provide the first dynamic version.

Key Insight: Non-uniform random cuts with CDF \(x^p/R_t^p\), combined with Poisson process timestamps to enable dynamic updates.

Core Idea: Through \(\ell_p\)-adapted non-uniform random cuts and Poisson-timestamp-driven dynamic maintenance, the paper achieves near-optimal explainable clustering for all \(\ell_p\) norms along with efficient dynamic updates for the first time.

Method¶

Overall Architecture¶

The algorithm takes \(k\) reference centers \(C\) as input and outputs a \(k\)-leaf threshold decision tree \(\mathcal{T}\). The core procedure Partition_Leaf recursively splits each node containing multiple centers into child nodes. The algorithm is data-agnostic—it depends only on the reference center set.

Key Designs¶

Non-Uniform Random Cuts (Partition_Leaf):
- Function: Recursively partitions the space containing center set \(C_u\) until each leaf contains at most half the centers.
- Mechanism: An anchor point \(m^u\) (coordinate-wise median) is selected; the algorithm iteratively samples a random coordinate \(i\), sign \(\sigma\), and parameter \(Z_t \sim \text{Uniform}[0, R_t^p]\), sets \(\theta_t = Z_t^{1/p}\), and defines the cut threshold \(\vartheta_t = m^u_i + \sigma \cdot \theta_t\).
- Key Property: The CDF of the threshold is \(x^p / R_t^p\)—cut density is low near the anchor and high far from it, ensuring separation probability scales with the \(p\)-th power of the \(\ell_p\) distance.
- Why Not Uniform Cuts: Uniform cuts treat all distances equally, causing the competitive ratio to degrade to \(\Theta(k^{1-1/p})\).
Three-Category Cut Analysis Framework:
- Function: Precisely analyzes the expected cost of incorrectly cutting a data point \(x\).
- Core Classification:
  - Safe cuts: The retreat distance is small (\(A_k M_t \leq R_t / (6^p \log^2 k)\)); total contribution is \(O(1) \cdot \|x-c\|_p\).
  - Light cuts: The radius is small (\(R_t \leq 6 \log^\alpha k \cdot \max\{\|x-m^t\|, \|c-m^t\|\}\)); contribution is \(O(p \log^{1+\alpha} k) \cdot \|x-c\|_p\).
  - Heavy and unsafe cuts: Bounded via Hölder's inequality; contribution is \(O((\log k)^{2-1/p-\alpha(p-1)}) \cdot \|x-c\|_p\).
- Setting \(\alpha = 1/p - 1/p^2\) balances the two terms, yielding the recurrence \(A_k \leq O(p (\log k)^{1+1/p-1/p^2} \log\log k)\).
Dynamic Algorithm (Poisson Timestamps):
- Function: Efficiently updates the decision tree upon insertions or deletions in the center set.
- Mechanism: Each random cut is assigned an arrival time \(\rho_t\) drawn from a Poisson process; timestamps remain invariant under center changes.
  - Insertion: Locate the earliest cut that separates the new center from the anchor, and insert it at the corresponding position in the tree.
  - Deletion: Locate the leaf and remove the corresponding cut.
  - Rebuild: Triggered when the number of updates exceeds one quarter of the initial center count.
- Complexity: \(O(d \log^3 k)\) amortized update time; \(O(\log k)\) amortized reassignments.

Theoretical Guarantees¶

Theorem 3.1 (Static): \(\mathbf{E}[\text{cost}_p(X, \mathcal{T})] \leq O(p \cdot (\log k)^{1+1/p-1/p^2} \log\log k) \cdot \text{cost}_p(X, C)\)

Theorem 5.1 (Dynamic): Same approximation ratio, with \(O(d \log^3 k)\) amortized update time and \(O(\log k)\) amortized reassignments.

Key Experimental Results¶

Theoretical Approximation Ratio Comparison¶

Norm	Prev. Best Upper Bound	Ours	Lower Bound	Gap
\(\ell_1\)	\(1 + H_{k-1}\) (tight)	\(O(\log k \log\log k)\)	\(\Omega(\log k)\)	\(O(\log\log k)\)
\(\ell_2\)	\(\tilde{O}(\log^{3/2} k)\)	\(\tilde{O}(\log^{5/4} k)\)	\(\Omega(\log k)\)	\(\tilde{O}(\log^{1/4} k)\)
General \(\ell_p\)	None	\(\tilde{O}(p (\log k)^{1+1/p-1/p^2})\)	\(\Omega(\log k)\)	Polylogarithmic

Dynamic Algorithm Complexity¶

Operation	Update Time (Amortized)	Reassignments (Amortized)	Approximation Guarantee
Center insertion	\(O(d \log^3 k)\)	\(O(\log k)\)	Same as static
Center deletion	\(O(d \log^3 k)\)	\(O(\log k)\)	Same as static

Key Findings¶

For \(\ell_2\), the exponent improves from \(3/2\) to \(5/4\), achieved through the more refined three-category cut analysis.
The logarithmic exponent \(1 + 1/p - 1/p^2\) always lies in \([1, 1.25]\), attaining its maximum at \(p=2\).
A lower bound theorem establishes that no \(p\)-oblivious algorithm can be simultaneously optimal for all \(p\) (an \(\Omega(d^{1/4})\) lower bound exists).
The dynamic algorithm can be composed with the dynamic k-medians framework of Bhattacharya et al. (2025) to achieve fully dynamic explainable clustering.

Highlights & Insights¶

Elegance of Non-Uniform Cuts: The CDF \(x^p/R_t^p\) is a natural match to the geometry of \(\ell_p\), making separation probability proportional to the \(p\)-th power of the distance—a key enabler of the logarithmic competitive ratio.
Poisson Timestamps as an Algorithmic Tool: The exponential clock technique, originally used purely for analysis, is elevated here to a design primitive; timestamps provide a globally unique identifier for each cut.
A General Bridge from Static to Dynamic: The paradigm of first assigning timestamps and then maintaining them may generalize to the dynamization of other recursive randomized algorithms.

Limitations & Future Work¶

For \(p=1\), the competitive ratio is \(O(\log k \log\log k)\), falling short of the tight bound by an \(O(\log\log k)\) factor.
For \(p=2\), a gap of \(\tilde{O}(\log^{1/4} k)\) remains.
The dynamic algorithm requires a full subtree rebuild at reconstruction thresholds, potentially causing occasional high-latency updates.
As a purely theoretical work, empirical evaluation of practical clustering performance is entirely absent.

vs. Dasgupta et al. (2020): The foundational work achieves only an \(O(k)\) upper bound; the present work attains logarithmic bounds for all \(\ell_p\).
vs. Gupta et al. (2023): The tight \(\ell_1\) bound of \(1 + H_{k-1}\); this paper falls short by only \(O(\log\log k)\) while covering all \(\ell_p\).
vs. Makarychev & Shan (2021): The \(\tilde{O}(\log^{3/2} k)\) bound for \(\ell_2\) is improved to \(\tilde{O}(\log^{5/4} k)\) and generalized.
vs. Bhattacharya et al. (2025): Dynamic unconstrained k-medians; the dynamic explainable clustering developed here can be directly layered on top.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First explainable clustering algorithm for general \(\ell_p\) norms plus the first dynamic version; major theoretical contribution.
Experimental Thoroughness: ⭐⭐ — Purely theoretical; no empirical validation.
Writing Quality: ⭐⭐⭐⭐ — Rigorous theoretical derivations; the three-category cut analysis framework is clearly presented.
Value: ⭐⭐⭐⭐ — Significant contribution to the theoretical foundations of explainable AI; dynamization is practically relevant for streaming data scenarios.