Lyapunov Learning at the Onset of Chaos¶

Conference: ICML 2025
arXiv: 2506.12810
Code: None
Area: Time Series
Keywords: Lyapunov exponent, edge of chaos, non-stationary time series, regime shift, online learning

TL;DR¶

This work proposes the Lyapunov Learning algorithm. By viewing the neural network as a dynamical system and incorporating a Lyapunov exponent regularization term into the loss function, the network is pushed toward the edge of chaos. This enables rapid self-adaptation when regime shifts occur in non-stationary time series, reducing the post-shift MSE by approximately 96% in Lorenz system experiments.

Background & Motivation¶

Background: Deep learning faces severe challenges when processing non-stationary time series. In online learning scenarios, the introduction of new data can disrupt learned historical knowledge—a phenomenon known as catastrophic forgetting. When the data source undergoes sudden statistical changes (regime shifts), the model must rapidly adapt to the new regime while retaining core knowledge relevant to the overall problem.

Limitations of Prior Work: Traditional regularization methods (e.g., L1, L2, Dropout) improve generalization but do not explicitly prepare the model for regime shifts. Existing continual learning methods primarily focus on integrating new data under static distributions rather than addressing sudden, drastic shifts in data statistics. The machine learning field lacks tools that enable neural networks to effectively explore new information to adapt to regime shifts.

Key Challenge: How can a neural network maintain stable predictions while possessing the ability to rapidly adapt to sudden changes? Excessive stability hinders adaptation, whereas excessive instability prevents reliable predictions. A balance must be struck between the two.

Goal: Inspired by Stuart Kauffman's "Adjacent Possible" theory, Lyapunov Learning is proposed to prepare models for regime shifts by leveraging the properties of non-linear chaotic dynamical systems. The core idea is to operate the network at the "edge of chaos," where the maximum Lyapunov exponent evolves around zero.

Key Insight: The neural network itself is viewed as a dynamical system, where the weight parameters determine the trajectory mapping inputs to outputs. By calculating the Lyapunov exponent spectrum of the sequences generated by the network, the sensitivity of the network to small perturbations can be quantified and subsequently controlled via regularization.

Core Idea: The edge of chaos is a critical boundary between "order" and "chaos"—at this state, the system possesses sufficient exploratory capacity to discover new patterns without spiraling out of control. By utilizing Lyapunov exponent regularization to push the network to this critical state, it can respond rapidly when a regime shift occurs.

Method¶

Overall Architecture¶

The overall workflow of Lyapunov Learning can be divided into three steps:

Treat the neural network \(\mathbf{F}(\mathbf{x}_t, \mathbf{w})\) as a discrete dynamical system, where \(\mathbf{x}_t\) represents the input data and \(\mathbf{w}\) denotes the network weights.
Starting from the ground-truth data, recursively apply the network to generate sequences, and compute the Jacobian matrices and Lyapunov exponents along these sequences.
Incorporate the Lyapunov exponents as a regularization term in the loss function, optimizing both prediction accuracy and dynamical characteristics simultaneously via gradient descent.

Key Designs¶

Lyapunov Exponent Computation Module:
For sequences generated by the neural network, the Jacobian matrix of the network with respect to the input, \(\mathbf{J}(\mathbf{x}_t)\), is computed at each timestep. The Lyapunov exponents are then estimated over a finite time step \(T\) via matrix multiplication: \(\Lambda = \lim_{T \to \infty} \frac{1}{T} \ln \left| \prod_{t=0}^{T} \mathbf{J}(\mathbf{x}_t) \right|\) In practice, QR decomposition is employed to stably estimate the eigenvalues of the matrix product. Synthetically, the entire computation process is differentiable with respect to the network weights \(\mathbf{w}\), enabling direct optimization via backpropagation.
Design Motivation: The Lyapunov exponent is a standard tool for assessing the chaotic nature of a dynamical system—positive values represent exponential divergence of trajectories (chaos), negative values represent convergence (stability), and zero values represent periodic behavior. By regulating these exponents, the dynamical behavior of the network can be precisely controlled.
Edge of Chaos Regularization:
The joint loss function is designed as: \(\mathcal{L}(\mathbf{x}_t, \hat{\mathbf{x}}_t) = \mathcal{L}_{\text{MSE}}(\mathbf{x}_t, \hat{\mathbf{x}}_t) + \alpha |\lambda|\) where \(\lambda\) is the chaotic system's maximum Lyapunov exponent, and \(\alpha\) controls the regularization strength. Using \(|\lambda|\) instead of \(\lambda\) ensures the maximum Lyapunov exponent is driven toward zero—the edge of chaos—rather than forcing the system to become entirely chaotic.
Design Motivation: At the edge of chaos, the system possesses maximum adaptability—it has sufficient instability to explore new directions in the solution space while maintaining enough stability to prevent divergence. This corresponds to Kauffman's concept of the "Adjacent Possible," where a system expands its space of possibilities through minor modifications of known elements.
Chaotic Attractor Generation Verification:
Prior to applying the method to practical tasks, the authors validated that Lyapunov Learning can indeed control the chaotic properties of the network. A network with only one hidden layer (10 neurons) was designed, using only the Lyapunov exponent as the loss function, to autonomously generate chaotic attractors starting from a single 3D point.
Design Motivation: This step is the foundation of the methodology. If the computation of the Lyapunov exponent cannot be proven accurate and capable of effectively governing network behavior, subsequent regularization applications lack a theoretical basis. The experiments successfully generated multiple chaotic attractors with varying maximum Lyapunov exponents (0.104, 0.191, 0.235), all of which satisfied the two necessary conditions of chaotic attractors.

Loss & Training¶

Loss Function: \(\mathcal{L} = \mathcal{L}_{\text{data}} + \alpha \cdot \mathcal{L}_{\text{Lyapunov}}\), where \(\mathcal{L}_{\text{Lyapunov}} = |\lambda|\) (the absolute value of the maximum Lyapunov exponent)
Training Strategy: Online learning mode where the network continuously predicts and updates without a fixed training endpoint. The first half of the training sequence uses one set of Lorenz parameters, while the second half abruptly switches to a different set of parameters to simulate a regime shift.
Hyperparameter Selection: \(\alpha = 1.0\) is the optimal weight, corresponding to a state where the system can most rapidly assimilate new dynamical possibilities without over-exploring or freezing.
Evaluation Metric: Loss ratio \(r = \frac{\mathcal{L}_{\text{vanilla}}^{MSE}}{\mathcal{L}_{\text{Lyap}}^{MSE}}\). Due to high run-to-run noise in chaotic dynamics, using a ratio eliminates joint fluctuations that simultaneously affect both models.

Key Experimental Results¶

Main Results¶

Experimental Scenario: Regime shift in the Lorenz system. The parameters for the first half are \(\sigma=20, \beta=8/3, \rho=28\) (slow convergence towards a limit cycle), and key parameters in the second half are switched to \(\sigma=10, \beta=4/3, \rho=28\) (classical chaotic Lorenz attractor).

Regularization Method	Best Loss Ratio \(r\)	Optimal Parameter
Dropout	0.44	\(P_{\text{dropout}} = 0.2\)
L2	0.73	\(\alpha = 1 \times 10^{-3}\)
L1	1.21	\(\alpha = 1 \times 10^{-4}\)
Lyapunov	1.96	\(\alpha = 1.0\)

Note: \(r > 1\) indicates that Lyapunov regularization outperforms the vanilla model, while \(r < 1\) indicates worse performance. Dropout and L2 even degraded the post-regime-shift performance.

Ablation Study¶

Configuration	Loss Ratio	Description
Varying \(\alpha\) values	See Figure 5	Optimal performance achieved when \(\alpha \approx 1.0\); excessively large or small values degrade performance
Chaotic Attractor Generation	\(\lambda = 0.104, 0.191, 0.235\)	Verifies the accuracy and controllability of the Lyapunov exponent estimation
Natural Dissipation	Sum of Lyapunov exponents is negative	Naturally satisfied in vanilla training, requiring no additional constraints

Key Findings¶

Lyapunov regularization reduces the MSE post-regime-shift by nearly a half (\(r \approx 1.96\)).
Traditional regularization techniques (Dropout, L2) impair performance under regime shifts, indicating that general-purpose regularization fails to equip models with adaptability to non-stationarity.
The optimal \(\alpha = 1.0\) corresponds to the best exploration-exploitation trade-off, aligning with the prediction of the Adjacent Possible theory.
The network architecture consists of a feedforward network with 4 layers and 50 neurons per layer, with all results averaged over 10 independent training runs.

Highlights & Insights¶

Novel Theoretical Perspective: By combining chaotic dynamical systems theory (Lyapunov exponents) with neural network training, this work provides a fundamentally new regularization paradigm. Instead of simple weight penalties, it directly controls the behavioral characteristics of the network as a dynamical system.
An Elegant Analogy to the Adjacent Possible: Kauffman's biological evolutionary theory is ingeniously mapped to machine learning—the edge of chaos represents the state where innovation is most likely to occur, allowing the system to remain adaptable without freezing or descending into chaos.
Profound Connection to Existing Sequence Models: The authors highlight that mechanisms such as spectral constraints in SSMs (e.g., Mamba), gradient norm control in linear attention, and orthogonal initialization in RNNs essentially constrain the Lyapunov exponents implicitly toward zero. Lyapunov Learning unifies these fragmented intuitions into a cohesive theoretical framework.
Solid Validation Methodology: The accuracy of the exponent estimation is first established via chaotic attractor generation before applying the method to real-world tasks, ensuring a complete and rigorous logical pipeline.

Limitations & Future Work¶

High Computational Overhead: The Jacobian computation scales as \(O(d^2)\) and QR decomposition as \(O(d^3)\), limiting scalability to deep or wide networks. The authors suggest utilizing random projections or subspace tracking to mitigate costs.
Limited Experimental Scale: All evaluations are restricted to low-dimensional (3D), noise-free chaotic Lorenz systems, omitting high-dimensional, stochastic, or partially-observed real-world scenarios.
Simplistic Network Architecture: The evaluations only use a 4-layer feedforward network with 50 neurons, leaving the effects on more complex architectures like Transformers or RNNs unexplored.
Single Type of Regime Shift: Sudden parametric changes in the Lorenz system represent a specific type of non-stationarity, leaving other patterns like gradual drift or multiple shifts untested.
Lack of Comparison with Continual Learning Methods: Baselines specifically designed for continual learning, such as EWC or Progressive Nets, are not included.
Lack of Theoretical Guarantees: Underneath the promising empirical results, a rigorous mathematical proof demonstrating why the edge of chaos yields superior adaptability remains to be established.

Catastrophic Forgetting & Continual Learning (McCloskey & Cohen 1989; Wang et al. 2024): This work introduces an entirely new perspective—mitigating knowledge forgetting via dynamical systems theory rather than memory replay or parameter protection.
Non-stationary Time Series (Liu et al. 2022): Non-stationary Transformers tackle non-stationarity through de-stationarization, which is complementary to the dynamical systems approach of Lyapunov Learning.
Edge of Chaos Computing (Langton 1990; Zhang et al. 2021): The concept of the edge of chaos has been explored as a guideline for neural network training, upon which this paper establishes an actionable algorithm.
SSMs and Sequence Models (Gu et al. 2024): There is a deep connection between the spectral constraints in models like Mamba and Lyapunov exponent control, which may be unified under a single framework in the future.
Insights for Future Research: Promising research directions include extending Lyapunov Learning to high-dimensional systems (employing random projection to lower complexity), combining it with existing sequence models (e.g., explicitly incorporating Lyapunov regularization into SSM training), and establishing theoretical links between the edge of chaos and adaptability.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ [Directly controlling the dynamical behavior of the network through a differentiable Lyapunov exponent regularization term is a highly novel perspective, and its connection to the Adjacent Possible theory is inspiring.]
Experimental Thoroughness: ⭐⭐⭐ [Validation is limited to a single low-dimensional chaotic system, lacking high-dimensional or real-world data experiments and comparisons with continual learning methods.]
Writing Quality: ⭐⭐⭐⭐ [The reasoning is clear, conceptual explanations are solid, and the discussion on connections to existing methods is profound, though the experimental evaluation is somewhat thin.]
Value: ⭐⭐⭐⭐ [The theoretical framework has the potential for unification (synthesizing stability techniques in SSMs, RNNs, etc., as Lyapunov control), but its practical application value requires further empirical verification.]