NeurIPS 2025 Optimization John ellipsoid convex optimization input sparsity treewidth non-negative matrix factorization Lewis weights leverage scores

Faster Algorithms for Structured John Ellipsoid Computation¶

Conference: NeurIPS 2025 arXiv: 2211.14407 Code: None Area: Convex Optimization / Algorithm Design Keywords: John ellipsoid, convex optimization, input sparsity, treewidth, non-negative matrix factorization, Lewis weights, leverage scores

TL;DR¶

For computing the John ellipsoid of a symmetric convex polytope $P = \{x \in \mathbb{R}^d : -\mathbf{1}_n \leq Ax \leq \mathbf{1}_n\}$, this paper proposes two fast algorithms: a near-input-sparsity algorithm based on sketching with per-iteration cost $\widetilde{O}(\text{nnz}(A) + d^\omega)$, and a treewidth-based algorithm with per-iteration cost $O(n\tau^2)$, both significantly improving upon the prior state-of-the-art cost of $O(nd^2)$.

Background & Motivation¶

John Ellipsoid: Fritz John's theorem guarantees the existence and uniqueness of a maximum-volume inscribed ellipsoid (the John ellipsoid) for any convex body. This is a fundamental problem in convex optimization.

Applications: High-dimensional sampling, linear programming, online learning, differential privacy, uncertainty quantification, and D-optimal experimental design.

Prior State-of-the-Art: Cohen, Cousins, Lee, and Yang (COLT 2019) proposed a fixed-point iteration based on $\ell_\infty$ Lewis weights, with per-iteration cost $O(nd^2)$ and $O(\varepsilon^{-1}\log(n/d))$ iterations.

Computational Bottleneck: Each iteration requires computing quadratic forms $a^\top B^{-1} a$ where $B$ is a weighted version of $A^\top A$. The standard approach via Cholesky decomposition costs $O(nd^2)$.

Goal: Exploit matrix structure (sparsity or small treewidth) to break the $O(nd^2)$ barrier.

Method¶

Problem Formulation¶

Given $A \in \mathbb{R}^{n \times d}$ of rank $d$, finding the maximum-volume inscribed ellipsoid is equivalent to solving:

\[\min_{w \geq 0} \sum_{i=1}^n w_i - \log\det\left(\sum_{i=1}^n w_i a_i a_i^\top\right) - d\]

Optimality conditions: $\sum_i w_i = d$, and $a_j^\top Q^{-1} a_j = 1$ whenever $w_j \neq 0$, where $Q = \sum_i w_i a_i a_i^\top$.

Fixed-Point Iteration (Base Framework)¶

Each iteration updates the weights as:

\[w_{k+1,i} = w_{k,i} \cdot a_i^\top (A^\top \text{diag}(w_k) A)^{-1} a_i\]

This is equivalent to computing the leverage scores of $B_k = \sqrt{\text{diag}(w_k)} A$: $w_{k+1,i} = \|(B_k^\top B_k)^{-1/2} \sqrt{w_{k,i}} a_i\|_2^2$.

Algorithm 1: Near-Input-Sparsity Algorithm (Theorem 1.1)¶

Core Idea: Two-level acceleration via leverage score sampling and random matrix sketching.

Leverage Score Sampling: Construct a sampling matrix $D_k$ using approximate leverage scores $\tilde{\sigma}$ such that: $$(1-\varepsilon_0) B_k^\top B_k \preceq B_k^\top D_k B_k \preceq (1+\varepsilon_0) B_k^\top B_k$$ Only $s = \Theta(\varepsilon_0^{-2} d \log(d/\delta))$ sampled rows are needed.
Random Sketching: Apply a Gaussian matrix $S_k \in \mathbb{R}^{s \times d}$ for dimensionality reduction: $$\hat{w}_{k+1,i} = \frac{1}{s}\|\widetilde{Q}_k \sqrt{w_{k,i}} a_i\|_2^2, \quad \widetilde{Q}_k = S_k (B_k^\top D_k B_k)^{-1/2}$$
Approximate Leverage Score Computation: Computed in $\widetilde{O}(\varepsilon_\sigma^{-2}(\text{nnz}(A) + d^\omega))$ time using the method of Drineas et al.

Per-Iteration Cost: $\widetilde{O}(\varepsilon^{-1}\text{nnz}(A) + \varepsilon^{-2} d^\omega)$

Total Complexity: $\widetilde{O}((\varepsilon^{-1}\text{nnz}(A) + \varepsilon^{-2} d^\omega) \cdot \varepsilon^{-1}\log(n/d))$

Algorithm 2: Small-Treewidth Algorithm (Theorem 1.2)¶

Core Idea: Exploit the treewidth $\tau$ of the incidence graph of $A$ to accelerate Cholesky decomposition.

Treewidth: Construct the incidence graph $G_A$ of $A$, where vertices correspond to columns and edges connect pairs of columns that are both nonzero in some row. Treewidth $\tau$ measures how "tree-like" the graph is.
Fast Cholesky Decomposition: When the treewidth is $\tau$, the factorization $A^\top W A = LL^\top$ can be computed in $O(n\tau^2)$ time, with each row of $L$ having at most $\tau$ nonzero entries.
Sparse $L$ Solves: By exploiting the sparsity of $L$, computing $b_{k,i}^\top (L_k L_k^\top)^{-1} b_{k,i}$ requires only $O(\tau^2)$ operations.

Per-Iteration Cost: $O(n\tau^2)$

Practical Relevance: Treewidth is typically $O(d^{1/2} \sim d^{3/4})$ on the Netlib LP benchmark; the largest MATPOWER power systems instance has $n=20467$, $d=12659$, yet treewidth only $\tau=35$.

Error Analysis: Telescope Lemma¶

The final approximation quality is decomposed into an initialization term and an accumulated per-step error:

\[\phi_i(u) \leq \frac{1}{T}\log\frac{n}{d} + \varepsilon/250 + \varepsilon_0\]

Choosing $T = O(\varepsilon^{-1}\log(n/d))$ ensures both terms are $O(\varepsilon)$.

Output Ellipsoid Guarantee¶

For a $(1+\varepsilon)$-approximate John ellipsoid $Q$:

\[\frac{1}{\sqrt{1+\varepsilon}} \cdot Q \subseteq P \subseteq \sqrt{d} \cdot Q\]

Volume guarantee: $\text{vol}(\frac{1}{\sqrt{1+\varepsilon}} Q) \geq \exp(-d\varepsilon/2) \cdot \text{vol}(Q^*)$.

Key Experimental Results¶

Complexity Comparison¶

Algorithm	Iterations	Per-Iteration Cost
CCLY19	$\varepsilon^{-1}\log(n/d)$	$nd^2$
Alg. 1 (sketching)	$\varepsilon^{-1}\log(n/d)$	$\varepsilon^{-1}\text{nnz}(A) + \varepsilon^{-2}d^\omega$
Alg. 2 (treewidth)	$\varepsilon^{-1}\log(n/d)$	$n\tau^2$

Speedup Analysis¶

When $A$ is sparse ($\text{nnz}(A) \ll nd$): Algorithm 1 offers substantial improvement.
When $A$ is dense but $n > d^\omega$: Algorithm 1 still outperforms CCLY19.
When $\tau \ll d$ (e.g., $\tau = O(d^{1/2})$): Algorithm 2 reduces cost from $nd^2$ to $nd$, a $d$-fold speedup.
MATPOWER case: with $\tau = 35$, Algorithm 2 reduces cost from $O(2 \times 10^{11})$ to $O(2.5 \times 10^7)$, approximately a $10^4\times$ speedup.

Highlights & Insights¶

Two Complementary Acceleration Paths: Sparsity is exploited via sketching and sampling; structural properties are exploited via treewidth, covering distinct practical scenarios.
Near-Input-Sparsity Time: Since reading the input already requires $O(\text{nnz}(A))$, Algorithm 1 is nearly optimal.
Telescope Lemma Innovation: Unlike CCLY19, which analyzes only sketching error, this paper simultaneously handles error accumulation from both sketching and sampling.
Solid Theoretical Contributions: High-probability guarantees with precise approximation factor analysis.

Limitations & Future Work¶

The work is purely theoretical with no numerical experiments to validate practical efficiency.
The $d^\omega$ term in approximate leverage score computation may remain a bottleneck for large $d$.
The treewidth algorithm requires computing a tree decomposition first (NP-hard to solve exactly; only $O(\tau \log^3 n)$ approximations are available).
Only centrally symmetric convex polytopes are considered; asymmetric cases are not covered.
Algorithm 1 relies on random matrices and carries a failure probability $\delta$.
Parallelization potential is not discussed.

vs. CCLY19: Both use the same fixed-point iteration framework; this paper reduces per-iteration cost from $O(nd^2)$ to $\widetilde{O}(\text{nnz}(A) + d^\omega)$ and $O(n\tau^2)$, respectively.
vs. Interior-Point Methods: Traditional $O(nd^3)$ cost is prohibitive; both proposed algorithms are faster in their respective applicable regimes.
vs. First-Order Methods: First-order methods require many iterations but have cheap per-step cost; the proposed methods require fewer iterations ($O(\log n)$) with high per-step efficiency.

Rating¶

Novelty: ⭐⭐⭐⭐ Both acceleration paths represent genuinely new technical contributions.
Experimental Thoroughness: ⭐⭐⭐ Purely theoretical; no numerical experiments.
Writing Quality: ⭐⭐⭐⭐ Proof structure is clear; technical overviews aid comprehension.
Value: ⭐⭐⭐⭐ Substantial improvement to a foundational convex optimization algorithm with broad application potential.

Algorithm	Iterations	Per-Iteration Cost
CCLY19	\(\varepsilon^{-1}\log(n/d)\)	\(nd^2\)
Alg. 1 (sketching)	\(\varepsilon^{-1}\log(n/d)\)	\(\varepsilon^{-1}\text{nnz}(A) + \varepsilon^{-2}d^\omega\)
Alg. 2 (treewidth)	\(\varepsilon^{-1}\log(n/d)\)	\(n\tau^2\)