A New Approach to Controlling Linear Dynamical Systems¶

Conference: ICLR2026
OpenReview: https://openreview.net/forum?id=BQIzu1T6F0
Code: To be confirmed
Area: Learning Theory / Online Control
Keywords: Online nonstochastic control, spectral filtering, Hankel matrix, regret bound, convex relaxation

TL;DR¶

This paper proposes Online Spectral Control (OSC): the control problem of linear dynamical systems under adversarial perturbations is transformed via convex relaxation using a set of system-independent "spectral filters" (eigenvectors of a specific Hankel matrix). While maintaining an optimal regret of $\tilde O(\gamma^{-4}\sqrt T)$, it reduces the per-step runtime dependency on the stability margin $\gamma$ from polynomial $O(\gamma^{-1})$ to logarithmic $O(\mathrm{polylog}(1/\gamma))$.

Background & Motivation¶

Background: Controlling linear dynamical systems (LDS) $x_{t+1}=Ax_t+Bu_t+w_t$ under adversarial perturbations is a core problem at the intersection of control and online learning. Classic LQR (Linear Quadratic Regulator) assumes quadratic costs and i.i.d. Gaussian noise, which rarely holds in modern applications—wind disturbances, faults, and adversaries can introduce structured, correlated, or even adversarial perturbations and costs. Consequently, "online nonstochastic control" has emerged as a more general framework: dynamics $(A,B)$ are known and time-invariant, states are observable, and costs are convex Lipschitz but provided adversarially. The goal is to minimize regret relative to the "best fixed policy in hindsight."

Limitations of Prior Work: GPC (Agarwal et al., 2019a), the foundational work in this line, parameterizes the controller using a "disturbance-action policy" (DAC): representing the current control $u_t$ as a linear combination of past perturbations $w_{t-1},\dots,w_{t-m}$. The issue lies in the memory length $m$—to ensure the influence of old perturbations decays to a negligible level, $m$ must be set to $O(\gamma^{-1}\log T)$, where $\gamma$ is the stability margin (spectral radius $\rho=1-\gamma$). Consequently, the number of parameters and per-step execution time depend polynomially on $1/\gamma$.

Key Challenge: Many practical scenarios favor marginal stability, where $\gamma$ is very small and $\rho$ is close to 1. Such systems retain long-term memory and allow for smoother, more energy-efficient control (common in robotics, thermal systems, and satellite dynamics). However, as $\gamma$ decreases, methods like GPC require longer memory windows and become computationally expensive—creating a direct conflict between "desired physical properties" and "affordable computational costs."

Key Insight: The authors observe that learning the linear mapping from "perturbation to control" is essentially a regression on a high-dimensional but highly redundant feature set (the raw perturbation sequence). Spectral filtering tools were originally developed for "sequence prediction" to bypass the non-convexity of learning LDS by using the top eigenvectors of a fixed Hankel matrix as a universal basis to compress long memory into a few components. Can this be adapted for "control"?

Core Idea: Use "spectral filters" (the top $h$ eigenvectors of a Hankel matrix determined by $\gamma$) as a universal feature map independent of the specific LDS. This relaxes the control problem into "convex regression on features generated by convolving perturbations with spectral filters." Consequently, the parameter count is reduced from $\sim 50d$ (memory $m$ times dimension) to $hd$ (where $h$ is only polylogarithmic), while achieving a superior regret bound.

Method¶

Overall Architecture¶

OSC (Algorithm 1) is a specific instance of Projected Online Gradient Descent (OGD) acting on a set of carefully designed spectral features. The workflow is:

Offline One-time calculation of the top $h$ eigenpairs $\{(\sigma_j,\phi_j)\}_{j=1}^h$ of an $m\times m$ Hankel matrix $H$. The elements of $H$ depend only on the assumed stability margin $\gamma$: $$H_{ij}=\frac{(1-\gamma)^{i+j-1}}{i+j-1}.$$ These filters are "universal"—independent of specific $(A,B)$, initial states, perturbations, or cost functions.
Online maintenance of a set of parameter matrices $M^t_{1:h}\in\mathbb R^{h\times n\times d}$. At each step $t$: extract the recent $m$ perturbations into $\tilde W_{t-1:t-m}=[w_{t-1},\dots,w_{t-m}]\in\mathbb R^{d\times m}$, and compute the control: $$u_t=\sum_{i=1}^{h}\sigma_i^{1/4}\,M^t_i\,\tilde W_{t-1:t-m}\,\phi_i .$$
Observe the new state $x_{t+1}$ and solve for the actual perturbation at this step $w_t=x_{t+1}-Ax_t-Bu_t$ (this step explicitly utilizes the known dynamics $A,B$).
Perform one step of gradient descent on a memory-less proxy loss $\ell_t$, followed by projection back onto the convex constraint set $K$: $M^{t+1}_{1:h}=\Pi_K[M^t_{1:h}-\eta\nabla_M\ell_t(M^t_{1:h})]$.

Intuitively: while GPC learns a "linear mapping of past perturbations," OSC learns a "linear mapping of past perturbations convolved with spectral filters." The latter uses fewer, more expressive features to extract truly useful structures from long-term memory.

Key Designs¶

1. Universal Spectral Filters: Replacing "system-dependent long memory" with "system-independent fixed bases" The pain point is that GPC's memory window $m$ must grow with $1/\gamma$, burdening parameters and computation. OSC's solution: noting that in the state evolution of a marginally stable LDS $x_t=\sum_i (A+BK)^{i-1}w_{t-i}$, the powers of the closed-loop matrix inject a Hankel structure into the response. The constructed $H_{ij}=(1-\gamma)^{i+j-1}/(i+j-1)$ exactly encodes the common memory decay pattern for "all systems with spectral radius $\le 1-\gamma$." Its top $h$ eigenvectors $\phi_i$ serve as universal filters independent of the specific $(A,B)$. The fourth-root weights $\sigma_i^{1/4}$ come from approximation error analysis, ensuring a small number of filters can cover the primary energy of the memory. The authors highlight a key technical difference from sequence prediction: in prediction, past responses are available, whereas online control must utilize system stability and integrate over a different set of eigenvalues, resulting in a Hankel structure (Eq 2.1) distinct from the prediction version.

2. Spectral Controller Class and Convex Relaxation: Transforming non-convex "finding optimal K" into convex "learning M" Optimizing directly over state-feedback policies $u_t=Kx_t$ is non-convex. The authors follow the "improper learning" approach: instead of learning $K$, they compete with a broader, convex-optimizable policy class. Specifically, they define the Spectral Controller Class: $$\Pi^{SC}_{h,m,\gamma}=\Big\{\pi(w_{t-1:t-m})=\sum_{i=1}^{h}\sigma_i^{1/4} M_i \tilde W_{t-1:t-m}\phi_i\Big\},$$ characterized by parameters $M_{1:h}$. Control is linear in $M$, making the cost convex in $M$—thus relaxing the original problem into convex online learning. Accordingly, the authors define a bounded parameter set (Definition 4.2) $K$ to ensure projected gradient descent runs on a bounded convex set without state/control explosion.

3. Memory-less Loss + Approximation Lemma: Simplifying "real cost with memory" into "analyzable single-step loss" Real costs $c_t(x_t,u_t)$ depend on the entire history (as states accumulate past controls), making standard OGD regret analysis inapplicable. The authors introduce a memory-less loss (Definition 4.4) $\ell_t(M_{1:h})=c_t(x_t(M_{1:h}),u_t(M_{1:h}))$, which acts as if the current parameters were used from the beginning. They bridge the gap using two lemmas: (i) an Approximation Lemma (Lemma 4.3) proving the spectral controller class can approximate target policy classes (diagonalizable stable policies) to any precision $\varepsilon T$; and (ii) Memory-less to Real Cost Approximation (Lemma 4.5), proving the single-step loss is sufficiently close to the real cost. Together, these allow the regret of OGD on $\ell_t$ to be translated back to the real cost.

4. Fast Online Convolution: Reducing per-step $\tilde W\phi_i$ calculation from $O(m)$ to $O(\mathrm{polylog})$ Even with fewer parameters, calculating $h$ matrix multiplications $\tilde W_{t-1:t-m}\phi_i$ would normally take $O(m)$ per step. By zero-padding $\phi_i$ to $T$ dimensions and employing efficient online convolution techniques (Agarwal et al., 2024a), the amortized running time per step is reduced to $O(\log^4(T/\gamma))$. This is the final piece that translates "logarithmic parameters" into "logarithmic computational cost."

Loss & Training¶

The optimizer is projected OGD with hyperparameters given by the main theorem: memory $m=\lceil\frac1\gamma\log(8C_1\sqrt T/\gamma^3)\rceil$, number of parameters $h=\lceil 4\log T\log(900C_1dT/\gamma^3)\rceil$ (polylogarithmic), and step size $\eta=C_2\sqrt{\gamma^3/(Tmh)}$. In experiments, Adam (lr $10^{-4}$) is used with per-step projection onto a unit ball.

Key Experimental Results¶

Experiments compare OSC against GPC, focusing on "fewer parameters with comparable or better performance." The setup involves an LDS with state dimension $d=10$ and single control input $n=1$, where $A$ has diagonal elements in $[0.5, 0.95]$. Both methods use a memory window of the last 50 steps: GPC has $50\times d$ parameters, while OSC uses only the top 15 Hankel eigenvectors ($15\times d$ parameters).

Main Results (Table 1)¶

Method	Regret	Time per step	Perturbations	Cost
LQR	$O(1)$	$O(1)$	i.i.d.	Fixed Quadratic
Online LQ (Cohen 2018)	$O(\gamma^{-2.5}\sqrt T)$	$O(1)$	i.i.d.	Online Quadratic
GPC (Agarwal 2019a)	$\tilde O(\gamma^{-5.5}\sqrt T)$	$O(\gamma^{-1}\log T)$	Adversarial	Online Convex Lipschitz
OSC (Ours)	$\tilde O(\gamma^{-4}\sqrt T)$	$O(\log^4(\gamma^{-1}T))$	Adversarial	Online Convex Lipschitz

OSC is the only method in the table that achieves the best runtime in the most general setting (adversarial perturbations + online convex Lipschitz cost), improving the $\gamma$ exponent from $-5.5$ to $-4$ and the time complexity to polylogarithmic.

Ablation Study (Figure 2–4)¶

Configuration	Observation	Explanation
Linear LDS, no hidden layer	OSC long-term loss is comparable to GPC	Only 15d vs 50d parameters
Linear LDS, 100-hidden MLP head	Both improve; OSC remains equal/better	Spectral features serve as strong inputs for complex models
Nonlinear LDS (ReLU)	OSC remains effective	Spectral features work even for certain nonlinear systems
State dimension $d\in\{2,5,10,20\}$	Performance degrades as $d$ increases	Controlling large state space with single input is inherently harder
Parameter count $h$	Diminishing returns after $h \approx 20$	Spectral features are highly compact ($h=5$ is competitive)

Key Findings¶

The core benefit of spectral filtering is "compactness": $h=5$ filters can match GPC's 50-step memory, confirming that the truly useful structure in memory is low-dimensional.
Spectral features are model-agnostic: they remain effective when used with MLP heads or under nonlinear dynamics, serving as a universal feature map.
Controlling large state spaces with single inputs is an intrinsic difficulty unrelated to specific algorithms.

Highlights & Insights¶

Adapting "Spectral Filtering from Prediction" to "Online Control": Identifies the technical difference in Hankel structures due to system stability requirements.
Universal Fixed Bases vs. System-Dependent Long Memory: This is the root cause for compressing the $1/\gamma$ dependency from polynomial to logarithmic. This insight could be applied to other long-memory online learning problems.
Convex Relaxation Triad: The combination of the spectral controller class, bounded constraint sets, and memory-less loss provides a paradigm for fitting non-convex, memory-heavy control problems into the standard OGD framework.
$\sigma_i^{1/4}$ Weighting: A subtle but critical detail that ensures approximation error convergence within a few filters.

Limitations & Future Work¶

Strong assumptions: Requires known time-invariant dynamics $(A,B)$, full state observability, and known $\gamma$. Unknown dynamics or partial observability are not covered.
The comparator class is restricted to "diagonalizable stable policies," which is narrower than general strong stability.
Regret dependency on $\gamma$ is still $\gamma^{-4}$; for extreme marginal stability ($\gamma \to 0$), the constant terms may be large. Experimental scale is small ($d \le 20$, single input).
Experiments report loss rather than regret, providing only an indirect measure of approximation to the optimal comparator.

vs GPC (Agarwal et al., 2019a): OSC uses $O(\mathrm{polylog})$ parameters instead of $O(\gamma^{-1})$, achieving strictly better theoretical bounds and runtime.
vs Online LQ (Cohen et al., 2018): OSC handles more general adversarial perturbations and convex Lipschitz costs.
vs Spectral Filtering for Prediction (Hazan et al., 2017): This work marks the first systematic application of these tools to online control, addressing control-specific Hankel structure differences.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First systematic introduction of spectral filtering to online nonstochastic control.
Experimental Thoroughness: ⭐⭐⭐ Validates compactness and ablations, but scale is small.
Writing Quality: ⭐⭐⭐⭐ Clear motivation and well-organized theoretical roadmap.
Value: ⭐⭐⭐⭐ Real improvement for marginally stable systems; universal features allow integration with stronger models.

Method	Regret	Time per step	Perturbations	Cost
LQR	\(O(1)\)	\(O(1)\)	i.i.d.	Fixed Quadratic
Online LQ (Cohen 2018)	\(O(\gamma^{-2.5}\sqrt T)\)	\(O(1)\)	i.i.d.	Online Quadratic
GPC (Agarwal 2019a)	\(\tilde O(\gamma^{-5.5}\sqrt T)\)	\(O(\gamma^{-1}\log T)\)	Adversarial	Online Convex Lipschitz
OSC (Ours)	\(\tilde O(\gamma^{-4}\sqrt T)\)	\(O(\log^4(\gamma^{-1}T))\)	Adversarial	Online Convex Lipschitz