ICLR2026 LLM Evaluation Quadratic Programming Deep Unfolding ADMM Sequential Quadratic Programming LSTM Policy PAC-Bayes

Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding¶

Conference: ICLR2026 arXiv: 2512.01565 Code: To be confirmed Area: LLM Evaluation Keywords: Quadratic Programming, Deep Unfolding, ADMM, Sequential Quadratic Programming, LSTM Policy, PAC-Bayes

TL;DR¶

This paper proposes FlexQP — an "always feasible" convex quadratic programming (QP) solver based on \(\ell_1\) elastic relaxation — and combines it with deep unfolding to learn an LSTM feedback policy that accelerates convergence, yielding Deep FlexQP. When embedded as a submodule within an SQP framework, it solves nonlinear trajectory optimization problems 4–16× faster than OSQP, reduces safety violations in predictive safety filters by over 70%, and improves task completion rates by 43%.

Background & Motivation¶

Background: Quadratic programming (QP) is a foundational subproblem in optimal control, combinatorial optimization, and machine learning. Sequential quadratic programming (SQP) handles nonlinear non-convex constrained optimization by iteratively solving QP subproblems.

Limitations of Prior Work: Constraint linearization in SQP frequently produces infeasible QP subproblems. Conventional solvers such as OSQP either terminate with an error or require dedicated infeasibility recovery routines (e.g., SNOPT's elastic mode), which do not scale well. Moreover, tuning ADMM hyperparameters (\(\rho, \sigma, \alpha\)) is notoriously difficult.

Core Idea: The paper recasts constrained QP as unconstrained optimization via \(\ell_1\) exact relaxation — recovering the original solution when feasible (Theorem 3.1) and automatically identifying the sparsest constraint violation when infeasible. Deep unfolding combined with an LSTM then replaces manual hyperparameter tuning with a dimension-agnostic learned feedback policy.

Method¶

Overall Architecture¶

Primal QP: \(\min_x \frac{1}{2}x^\top P x + q^\top x\), s.t. \(Gx \leq h, Ax = b\) → introduce slack variables → \(\ell_1\) elastic relaxation → ADMM splitting → two-block iteration (linear system solve + soft thresholding). Deep unfolding treats the \(K\)-step ADMM iteration as a \(K\)-layer network and learns per-layer parameters.

Key Design 1: Exact Relaxation in FlexQP¶

Hard constraints are replaced by the \(\ell_1\) penalty \(\mu_I \|Gx+s-h\|_1 + \mu_E \|Ax-b\|_1\)
Theorem 3.1: When \(\mu_I \geq \|y_I^*\|_\infty\) and \(\mu_E \geq \|y_E^*\|_\infty\), the relaxed solution coincides exactly with the original solution
Theorem 3.2: Convergence is guaranteed under a weak coercivity assumption
In the infeasible case, \(z_I^*, z_E^*\) automatically provide sparse certificates of constraint violation

Key Design 2: LSTM Feedback Policy¶

Independent policies are learned for inequality constraints (\(\pi_I\)), equality constraints (\(\pi_E\)), and the relaxation parameter (\(\pi_\alpha\))
Policy inputs consist of ADMM variables together with primal/dual residuals, applied in batch along the constraint dimension → dimension-agnostic, generalizing to large-scale problems
The LSTM captures long-range dependencies in the optimization history to adaptively adjust \(\rho, \mu, \alpha\)

Key Design 3: Normalized Training Loss and PAC-Bayes Bound¶

The training loss incorporates Lagrange multipliers: \(\min_\theta \sum_k \|\xi^k(\theta) - \xi^*\|_2 / \|\xi^*\|_2\), \(\xi = (x, y_I, y_E)\), implicitly enforcing \(\mu \geq |y^*|\)
A log-scale PAC-Bayes loss (Eq. 14) is proposed: in the regime of very small residuals, it carries orders of magnitude more information than the standard loss (Eq. 13), yielding tighter generalization guarantees

Key Experimental Results¶

Small- and Medium-Scale QP (500 training / 1000 test problems)¶

Solver	Convergence Speed (iterations)	Final Residual
OSQP (hand-tuned)	Baseline	Baseline
Deep OSQP	Better than OSQP	Better than OSQP
Deep OSQP-Improved	Further improvement	Further improvement
Deep FlexQP	Fastest among all methods	Lowest among all methods

Large-Scale QP (10k variables / 10–20k constraints)¶

Problem Class	Advantage of Deep FlexQP
Portfolio Optimization (10k var, 10k con)	Fewest iterations; generalizes via fine-tuning a small model
SVM (10k var, 20k con)	Fewest CG iterations

SQP Nonlinear Optimization¶

Metric	Deep FlexQP + SQP vs. OSQP + SQP
Trajectory optimization speed	4–16× faster (average over 100 problems)
Safety filter safety violations	Reduced by >70%
Safety filter task completion rate	Improved by 43%

Key Findings¶

The FlexQP architecture itself (elastic relaxation + LSTM) is the primary driver of superiority — under the same loss function, Deep OSQP variants benefit far less from fine-tuning than Deep FlexQP
Training on small-scale problems followed by fine-tuning for 5 epochs on 100 large-scale problems suffices to generalize to 10k+ dimensional instances
The log-scale PAC-Bayes bound renders generalization guarantees practically meaningful, whereas the standard bound is uninformative in the small-residual regime

Highlights & Insights¶

Theoretical elegance: The combination of \(\ell_1\) exact relaxation, ADMM, and deep unfolding is organic, with each component backed by rigorous mathematical guarantees
High practical value: The approach resolves the core engineering bottleneck of infeasible subproblems in SQP without requiring additional recovery routines
The dimension-agnostic LSTM policy design enables a single trained model to generalize to problems of arbitrary scale

Limitations & Future Work¶

Training overhead on large-scale problems remains substantial (approximately 3 hours per epoch); full training would require 300+ days
Validation is limited to dense QP instances; sparse QP problems (e.g., power network optimization) may require different strategies
The interpretability of the LSTM policy is limited, making it difficult to understand the learned tuning rules

Compared with Deep OSQP by Saravanos et al. (2025), the key advantages of FlexQP lie in native infeasibility handling and vector-level (rather than scalar-level) penalty parameter policies
This work may inspire the application of deep unfolding to other optimization algorithms such as interior-point methods and Frank-Wolfe

Rating¶

Novelty: ⭐⭐⭐⭐ The combination of elastic relaxation and deep unfolding is novel, though each individual component is not entirely new
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers small- to large-scale QP and nonlinear SQP across finance, ML, and control domains
Writing Quality: ⭐⭐⭐⭐ Clear structure with rigorous theoretical derivations
Value: ⭐⭐⭐⭐⭐ Addresses a core engineering pain point in SQP with broad application prospects