Skip to content

Consistent Low-Rank Approximation

Conference: ICLR2026
arXiv: 2603.02148
Code: To be confirmed
Area: Others
Keywords: low-rank approximation, streaming algorithm, consistency, recourse, online algorithm

TL;DR

The paper proposes and systematically studies the "Consistent Low-Rank Approximation" problem—maintaining a near-optimal rank-\(k\) approximation of a matrix whose rows arrive in a stream while minimizing the total change in the solution (recourse). It proves that \(O(k/\varepsilon \cdot \log(nd))\) recourse is feasible under additive error, \(k^{3/2}/\varepsilon^2 \cdot \text{polylog}\) recourse is feasible under \((1+\varepsilon)\) multiplicative error, and provides a lower bound of \(\Omega(k/\varepsilon \cdot \log(n/k))\).

Background & Motivation

Background: Low-rank approximation is a core tool in machine learning (recommendation systems, dimensionality reduction, feature engineering). In streaming scenarios where data arrives row by row, it is necessary to maintain low-rank approximations online. Frequent Directions and online ridge leverage score sampling are the prevailing methods.

Limitations of Prior Work: (a) Online algorithms focus solely on approximation quality, neglecting solution stability—frequent changes to factor matrices lead to high retraining costs for downstream models; (b) Frequent Directions can incur a recourse of \(O(n)\) when the \(k\)-th and \((k+1)\)-th singular vectors alternate (disastrous); (c) Consistency (recourse) has been studied in clustering and caching problems but has not yet been systematized for low-rank approximation.

Key Challenge: The optimal subspace for online low-rank approximation can change completely at every step (\(\Omega(nk)\) recourse), yet downstream applications require stable feature spaces. The goal is to achieve an optimal trade-off between approximation quality and solution stability.

Goal: To formalize the consistent low-rank approximation problem and provide upper and lower bounds.

Key Insight: Measure recourse using the Frobenius distance between subspace projection matrices, combine online ridge leverage score sampling to reduce effective stream length, and minimize step-wise factor changes through fine-grained singular value analysis.

Core Idea: Compress the stream length via ridge leverage sampling and process updates by categorizing singular values (replacing small singular values directly while maintaining stable large ones) to achieve sub-quadratic recourse.

Method

Overall Architecture

Matrix \(\mathbf{A} \in \mathbb{R}^{n \times d}\) arrives row by row. At each time \(t\), the algorithm outputs a rank-\(k\) factor \(\mathbf{V}^{(t)} \in \mathbb{R}^{k \times d}\). The objective is to ensure both approximation quality \(\|\mathbf{A}^{(t)} - \mathbf{A}^{(t)} (\mathbf{V}^{(t)})^\top \mathbf{V}^{(t)}\|_F^2 \leq (1+\varepsilon) \cdot \text{OPT}_t\) and minimize the total recourse over the stream \(\sum_t \|\mathbf{P}_{\mathbf{V}^{(t)}} - \mathbf{P}_{\mathbf{V}^{(t-1)}}\|_F^2\). To this end, the paper presents an additive error algorithm, a multiplicative error algorithm, and a matching lower bound.

Key Designs

1. Measuring Recourse: Frobenius Distance of Projection Matrices

The first step in addressing consistency is selecting the correct metric for "change." If one directly compares the difference between adjacent factor matrices \(\mathbf{V}^{(t)}\) and \(\mathbf{V}^{(t-1)}\), a meaningless basis rotation (changing the orthogonal basis for the same subspace) would be miscalculated as a massive change. This work instead uses the Frobenius distance of subspace projection matrices \(\|\mathbf{P}_{\mathbf{V}^{(t)}} - \mathbf{P}_{\mathbf{V}^{(t-1)}}\|_F^2\) to measure recourse—it only considers whether the spanned subspace actually changes and is invariant to basis rotations. This makes recourse a natural and robust metric, allowing subsequent analysis to focus on the subspace level.

2. Additive Error Algorithm: Triggering Recomputation via Norm Growth

Under additive error (Theorem 1.1), the algorithm is minimalist: it monitors the Frobenius norm of the current matrix. Recomputation of the SVD occurs only when \(\|\mathbf{A}^{(t)}\|_F^2\) increases by a factor of \((1+\varepsilon)\) compared to the last recomputation; otherwise, the old factor is retained. This design works because the total energy of new rows between two recomputations is capped at \(\leq \varepsilon \cdot \|\mathbf{A}^{(t)}\|_F^2\), naturally controlling the additive error. Since the norm doubles at most \(O(1/\varepsilon \cdot \log(ndM))\) times (where \(M\) is the element upper bound), and the recourse per recomputation is at most \(k\), the total recourse becomes \(O(k/\varepsilon \cdot \log(ndM))\)—which nearly matches the lower bound.

3. Multiplicative Error Algorithm: Compressing Stream Length and Case-wise Tail Singular Value Updates

This is the core contribution: tightening the error guarantee to \((1+\varepsilon)\) multiplicative (Theorem 1.3). While direct recomputation at every step costs \(k^2/\varepsilon^2 \cdot \text{polylog}\), this work reduces it to sub-quadratic through two steps. First, online ridge leverage score sampling compresses the effective stream length from \(n\) to \(k/\varepsilon \cdot \text{polylog}\), ensuring expensive updates only happen for a few "important" rows. Second, fine-grained updates are performed on the compressed stream, specifically treating the weakest \(\sqrt{k}\) of the top-\(k\) singular values with case work: when the energy of this tail is small (\(\sum_{i=k-\sqrt{k}}^{k} \sigma_i^2\) is very small), these directions are insignificant, and the tail singular vectors can be directly replaced by new rows, accumulating enough changes over \(\sqrt{k}\) steps before a unified recomputation (averaging \(O(k)\) recourse per \(\sqrt{k}\) steps); when tail energy is large, the optimal subspace is inherently stable and won't be violently disturbed by new rows, so the top SVD is recomputed with a single recourse not exceeding \(\sqrt{k}\). Together, these yield a total recourse of \(k^{3/2}/\varepsilon^2 \cdot \text{polylog}\) for integer-bounded matrices. The \(\sqrt{k}\) threshold optimizes the trade-off between "replacing \(r\) factors costing \(k \cdot r\)" and "recomputing every \(k^2/r\) steps." Fine-grained analysis is also applied to anti-Hadamard integer matrices where the optimal low-rank cost can be exponentially small. For data streams with a polynomial online condition number, the bound can be further reduced to \(k/\varepsilon^2 \cdot \text{polylog}\).

4. Lower Bound: Subspace Alternation Construction for \(\Omega(k/\varepsilon \cdot \log(n/k))\)

To demonstrate that the recourse is near its limit, the paper constructs a hard instance (Theorem 1.4): the stream is divided into \(\Theta(1/\varepsilon \cdot \log(n/k))\) phases, each forcing the optimal subspace to switch between two sets of orthogonal bases. Even if the entire stream is known in advance, any algorithm maintaining a multiplicative \((1+\varepsilon)\) approximation must switch subspaces, resulting in a total recourse of at least \(\Omega(k/\varepsilon \cdot \log(n/k))\). This is only a \(\sqrt{k}\) factor away from the additive upper bound, delineating the room for improvement in multiplicative bounds.

Key Experimental Results

Main Results (Empirical Evaluation)

Algorithm Approx Quality Measured Recourse Description
Frequent Directions Good \(O(n)\) (Catastrophic) Singular values alternate frequently
Naive SVD Recompute Optimal \(O(nk)\) Recompute every step
Ridge Sampling + Recompute \((1+\varepsilon)\) \(O(k^2/\varepsilon^2)\) Baseline
Ours \((1+\varepsilon)\) \(O(k^{3/2}/\varepsilon^2)\) Sub-quadratic, optimal

Key Findings

  • Frequent Directions suffers catastrophic recourse: Its \(O(n)\) level makes it entirely unsuitable for applications requiring stability.
  • Theoretical Worst-case vs. Reality: The recourse on actual data is significantly lower than the theoretical upper bound.
  • The \(\sqrt{k}\) threshold is critical: Switching case work between "weak tail" and "strong tail" is the core algorithmic design.

Highlights & Insights

  • Pioneering Problem Definition: First to extend consistency (recourse) from clustering to low-rank approximation—filling a significant gap in online algorithm theory.
  • Elegant \(\sqrt{k}\) Balance Point: The trade-off between replacing \(r\) factors (step-wise recourse \(k \cdot r\)) and periodic recomputation (\(k^2/r\)) is neatly optimized at \(r = \sqrt{k}\).
  • Rigorous Handling of Anti-Hadamard Matrices: Addressing boundary cases where optimal costs for integer matrices are exponentially small demonstrates theoretical depth.
  • Clear Practical Motivation: Solving the engineering pain point of frequent model retraining in feature engineering.

Limitations & Future Work

  • Purely Theoretical Contribution: The experiments are relatively simple and lack validation on large-scale real-world datasets (e.g., Netflix Prize).
  • Gap between Bounds: With an upper bound of \(O(k^{3/2}/\varepsilon^2)\) vs. a lower bound of \(\Omega(k/\varepsilon)\), the tightness of the \(\sqrt{k}\) factor remains an open question.
  • Row-only Arrival: More complex streaming models like column arrival, sliding windows, and insertion-deletion are not covered.
  • Future Directions: (a) Closing the gap between upper and lower bounds; (b) Expanding to dynamic streams (insertion + deletion); (c) Improving practical performance by incorporating distributional assumptions.
  • vs. Consistent Clustering (Lattanzi & Vassilvitskii): Consistent clustering uses geometric properties to select robust centers; low-rank approximation requires different subspace techniques.
  • vs. Frequent Directions: FD is an excellent streaming algorithm but has disastrous recourse. The proposed algorithm is specifically designed for consistency.
  • vs. Online PCA: Online PCA focuses on regret rather than recourse. These two objectives are orthogonal.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Pioneering problem definition + non-trivial algorithmic design and analysis.
  • Experimental Thoroughness: ⭐⭐⭐ Primarily theoretical, with basic but sufficient experimental validation.
  • Writing Quality: ⭐⭐⭐⭐⭐ Clear technical overview, intuitive explanations, and deep anti-Hadamard analysis.
  • Value: ⭐⭐⭐⭐ Opens a new direction for consistent online algorithms with practical inspiration for feature engineering.