Principled Data Augmentation for Learning to Solve Quadratic Programming Problems¶
Conference: NeurIPS 2025 arXiv: 2506.01728 Authors: Chendi Qian, Christopher Morris (RWTH Aachen University) Code: GitHub Area: Graph Learning Keywords: Data Augmentation, Quadratic Programming, Graph Neural Networks, Contrastive Learning, Learning to Optimize, KKT Conditions
TL;DR¶
This paper proposes a principled data augmentation framework based on affine transformations of the KKT system, generating optimality-preserving augmented instances for MPNN-based learning-to-optimize (L2O) on linear programming (LP) and quadratic programming (QP) tasks. Combined with contrastive pretraining, the framework substantially improves performance in data-scarce and out-of-distribution (OOD) generalization settings.
Background & Motivation¶
State of the Field¶
Linear programming and quadratic programming are fundamental optimization problems in machine learning, operations research, and scientific computing. In recent years, message-passing graph neural network (MPNN)-based L2O methods have attracted wide attention. These approaches encode LP/QP instances as constraint-variable bipartite graphs and learn, in a data-driven manner, to solve or assist solvers (e.g., by approximating strong branching scores).
Limitations of Prior Work¶
- Data scarcity: L2O methods require large amounts of labeled data (optimal solutions), and labeling is costly, particularly for complex QP instances.
- General graph augmentations are inapplicable: Operations in methods such as GraphCL — node dropping, edge perturbation, feature masking — destroy the constraint structure and optimality conditions of LP/QP instances, degrading performance.
- Absence of structure-preserving augmentation strategies: Existing L2O work has largely not explored how to design augmentations that preserve both structure and optimality for LP/QP.
- Gap in contrastive pretraining: Although graph contrastive learning has succeeded on general graph tasks, effective unsupervised pretraining strategies for LP/QP L2O remain absent.
Root Cause¶
There is a need for a theoretically grounded data augmentation framework that can generate structurally diverse, optimality-preserving LP/QP instances without re-invoking a solver, while supporting both supervised training and contrastive pretraining paradigms.
Method¶
Core Idea: Affine Transformations of the KKT System¶
Given a linearly constrained quadratic program (LCQP) instance \(I = (\mathbf{Q}, \mathbf{A}, \mathbf{b}, \mathbf{c})\), the optimal primal-dual solution satisfies the KKT conditions, which can be written compactly as:
The key observation is that applying affine transformations \(\mathbf{M}\) (multiplicative) and \(\mathbf{B}, \boldsymbol{\beta}\) (additive) to the KKT system yields a new valid KKT system, from which a new QP instance can be constructed and its optimal solution recovered directly via matrix operations — without calling a solver.
Conditions for Valid Transformations¶
For the transformed system to constitute a valid KKT form, the following must hold: 1. \(\mathbf{T}_{11} = \mathbf{M}_1 \mathbf{Q} \mathbf{M}_1^\top\) is positive definite (convexity of the new objective). 2. \(\mathbf{T}_{22} = \mathbf{0}\) (consistency of constraint structure). 3. \(\mathbf{T}_{12} = \mathbf{T}_{21}^\top\) (symmetry). 4. \(\mathbf{M}_{22}\) satisfies the conditions in Proposition 2.1 (recoverability of the primal solution).
Six Concrete Transformations¶
| Transformation | Operation | Efficiently Recoverable | Solution-Agnostic |
|---|---|---|---|
| Remove free variables | Drop variables whose optimal values are zero | ✓ | ✗ |
| Remove inactive constraints | Drop strictly inactive inequality constraints | ✓ | ✗ |
| Scale variable coefficients | Diagonal \(\mathbf{M}_1\) scales variable coefficients in objective and constraints | ✓ | ✓ |
| Scale constraints | Diagonal \(\mathbf{M}_2\) scales constraint coefficients and right-hand sides | ✓ | ✓ |
| Add variables | Introduce new variables via \(\mathbf{M}_1 = [\mathbf{I}; \mathbf{q}^\top]\) | ✓ | ✓ |
| Add constraints | Append new inequality constraints to the problem | ✓ | ✓ |
"Efficiently recoverable" means the new solution can be computed from the original in \(O(n)\) time; "solution-agnostic" means the transformation parameters do not depend on the original optimal solution and can thus be used in unsupervised settings.
Training Paradigms¶
Supervised augmentation: Transformations are applied dynamically during training to generate augmented instances, and an MPNN is trained with a supervised loss to predict optimal objective values.
Contrastive pretraining: Self-supervised pretraining is performed using the NT-Xent loss. For each instance, two augmented views are generated via solution-agnostic transformations as positive pairs, with other instances serving as negatives. Pretraining trains only the MPNN backbone, which is subsequently fine-tuned with an MLP head under supervision.
Key Experimental Results¶
Experiment 1: Data Augmentation under Supervised Learning¶
Augmentation methods are evaluated on synthetic LP/QP datasets (100 variables × 100 constraints, 10,000 instances). The metric is mean relative objective error (%).
| Augmentation | LP 10% | LP 20% | LP 100% | QP 10% | QP 20% | QP 100% |
|---|---|---|---|---|---|---|
| None | 8.539 | 6.149 | 2.784 | 5.304 | 3.567 | 1.240 |
| Drop node | 8.618 | 7.131 | 4.654 | 6.355 | 4.867 | 2.594 |
| Mask node | 8.979 | 7.591 | 3.216 | 5.865 | 4.005 | 1.347 |
| Flip edge | 7.794 | 6.503 | 4.358 | 5.485 | 4.361 | 2.561 |
| Drop vars (Ours) | 4.996 | 3.484 | 1.484 | 4.232 | 2.374 | 0.835 |
| Drop cons (Ours) | 5.563 | 3.691 | 1.656 | 3.407 | 2.104 | 0.821 |
| Scale vars (Ours) | 4.682 | 3.208 | 1.241 | 3.790 | 2.468 | 0.649 |
| Combo (Ours) | 3.465 | 2.300 | 1.051 | 2.434 | 1.532 | 0.542 |
General graph augmentations (Drop/Mask/Flip) degrade performance. The combined proposed method (Combo) achieves a relative improvement of 62.6% under the LP 20% setting.
Experiment 2: Contrastive Pretraining and Fine-tuning¶
The full unlabeled training set is used for contrastive pretraining, followed by fine-tuning on varying fractions of labeled data.
| Pretraining | LP 10% | LP 20% | LP 100% | QP 10% | QP 20% | QP 100% |
|---|---|---|---|---|---|---|
| None | 8.539 | 6.149 | 2.784 | 5.304 | 3.567 | 1.240 |
| GraphCL | 9.694 | 6.033 | 2.613 | 6.024 | 3.865 | 1.253 |
| GCC | 8.248 | 5.761 | 2.686 | 11.344 | 6.356 | 1.472 |
| IGSD | 18.097 | 7.631 | 3.082 | 12.799 | 6.306 | 1.302 |
| MVGRL | 20.392 | 9.072 | 2.852 | 8.440 | 4.932 | 1.343 |
| Ours | 3.472 | 2.794 | 1.588 | 3.791 | 2.427 | 0.926 |
The proposed pretraining reduces error from 8.539 to 3.472 (59.4% improvement) under LP 10%, and from 5.304 to 3.791 (28.5% improvement) under QP 10%. Most existing graph contrastive learning methods (IGSD, MVGRL, etc.) degrade performance.
Experiment 3: OOD Generalization¶
Pretrained models are fine-tuned on unseen optimization problem families to assess transfer learning capability.
LP OOD Generalization (Set Cover / MIS / CA / CFL):
| Problem | Pretrained | 10% | 20% | 100% |
|---|---|---|---|---|
| Set Cover | No | 5.297 | 2.531 | 0.752 |
| Set Cover | Yes | 1.965 | 1.349 | 0.662 |
| CA | No | 2.381 | 1.803 | 0.906 |
| CA | Yes | 2.303 | 1.713 | 0.753 |
QP OOD Generalization (SVM / Portfolio / LASSO):
| Problem | Pretrained | 10% | 20% | 100% |
|---|---|---|---|---|
| SVM | No | 0.191 | 0.109 | 0.027 |
| SVM | Yes | 0.141 | 0.077 | 0.024 |
| LASSO | No | 5.405 | 4.083 | 2.159 |
| LASSO | Yes | 5.169 | 3.726 | 1.178 |
The pretrained model outperforms training from scratch on nearly all OOD tasks, demonstrating strong transferability.
Highlights & Insights¶
- Theoretical elegance: The affine transformation framework over the KKT system rests on a unified mathematical foundation; all transformations carry rigorous theoretical guarantees (optimality-preservation, convexity-preservation), in contrast to heuristic graph augmentations.
- Practical efficiency: All six transformations admit \(O(n)\)-time solution recovery without re-invoking a QP solver; solution-agnostic transformations are directly applicable to unsupervised pretraining.
- Substantial improvements: The combined augmentation reduces error by over 60% in data-scarce settings; contrastive pretraining consistently outperforms GraphCL and analogous methods on both LP and QP.
- OOD generalization: The pretrained model successfully transfers to diverse problem families including Set Cover, SVM, and Portfolio optimization, demonstrating that the model learns general structural representations of optimization problems.
- Compelling negative evidence: Experiments clearly show that general graph augmentation methods (node dropping, edge perturbation, etc.) are harmful on LP/QP tasks, underscoring the necessity of domain-specific augmentation design.
Limitations & Future Work¶
- Restricted to LCQP: The framework requires \(\mathbf{Q}\) to be positive definite and does not extend to non-convex QP, semidefinite programming (SDP), or the integer components of mixed-integer linear programming (MILP).
- Limited transformation diversity: Only four solution-agnostic transformations are available in practice (scaling variables/constraints, adding variables/constraints); removal operations require access to the optimal solution.
- Single regression task: Evaluation is limited to objective value prediction; the framework has not been tested on more practical L2O applications such as strong branching scoring or cutting plane selection.
- Small problem scale: Experiments are fixed at 100 variables × 100 constraints, leaving scalability to large-scale instances (thousands of variables) insufficiently validated.
- Simple pretraining strategy: Only the NT-Xent contrastive loss is used; more advanced self-supervised learning methods (e.g., BYOL, VICReg) are not explored.
- Ad hoc combination strategy: The Combo method applies transformations randomly without an adaptive selection mechanism to choose the optimal transformation subset based on instance characteristics.
Related Work & Insights¶
- GraphCL (You et al., 2020): General graph augmentations (node dropping / edge perturbation / feature masking) degrade performance on LP/QP by destroying the constraint structure of optimization instances.
- GCC (Qiu et al., 2020): Random-walk-based subgraph sampling leads to severe degradation on QP (11.344 vs. 5.304).
- Li et al. (2024c): The only contrastive approach targeting LP; it adopts a CLIP-style formulation, whereas the proposed framework is more general (covering both LP and QP) with theoretically grounded transformations.
- Duan et al. (2022): Designs satisfiability-preserving augmentations for SAT problems — a conceptually related idea, though the present work addresses continuous optimization.
- Gasse et al. (2019): Seminal work on MPNN for strong branching in MILP, but does not consider data augmentation.
- Chen et al. (2024a): Extends the bipartite graph representation of LP to QP by adding edges between variables; this graph representation is adopted as the MPNN input in the present work.
Rating¶
- Novelty: ⭐⭐⭐⭐ — First systematic KKT-theoretic data augmentation framework for LP/QP L2O.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Covers supervised, pretraining, and OOD settings with comprehensive baselines, though problem scales are limited.
- Writing Quality: ⭐⭐⭐⭐⭐ — Theoretical derivations are clear, transformation taxonomy is systematic, and experimental design is well-motivated.
- Value: ⭐⭐⭐⭐ — Fills a gap in LP/QP data augmentation with a concise and practical approach; application scope warrants further expansion.