Minimum-Cost Network Flow with Dual Predictions¶
Conference: AAAI 2026 arXiv: 2601.20203 Code: Not available Area: Autonomous Driving / Combinatorial Optimization / Algorithms with Predictions Keywords: Minimum-cost flow, dual predictions, ε-relaxation, algorithms with predictions, chip routing, traffic networks
TL;DR¶
This paper presents the first learning-augmented algorithm for minimum-cost network flow (MCF) based on dual predictions. By warm-starting the classical ε-relaxation algorithm with machine-learned dual solutions, the proposed method ties its complexity bound to the \(\ell_\infty\)-norm of the prediction error (achieving both consistency and robustness), and demonstrates average speedups of 12.74× on traffic networks and 1.64× on chip escape routing.
Background & Motivation¶
Background: MCF is a fundamental problem in combinatorial optimization, with broad applications in traffic network routing, EDA chip design, and computer vision (multi-object tracking). Classical solvers include Network Simplex, Successive Shortest Paths (SSP), and ε-relaxation.
Limitations of Prior Work: Solving large-scale MCF instances (millions of nodes/edges) is computationally expensive. Worst-case analysis is overly pessimistic—SSP theoretically requires exponential iterations, yet converges rapidly in practice.
Rise of Algorithms with Predictions: The "algorithms with predictions" framework has recently achieved results on matching (Dinitz et al. 2021), maximum flow (Davies et al. 2023), and minimum cut, yet MCF remains unexplored.
Why ε-relaxation: (a) It is a dual algorithm, so predictions require only \(n\) variables (vs. \(m\) for the primal); (b) it is naturally parallelizable (node iterations are decoupled); (c) its complexity analysis is tractable, unlike Network Simplex whose complexity is unknown.
Core Idea: Learn a predicted dual optimal solution \(\hat{p}\) for MCF and warm-start ε-relaxation, reducing complexity from \(O(n^3\log(nC))\) to \(O(n^3\log\|\hat{p}-p^*\|_\infty)\).
Method¶
Overall Architecture¶
Given network \(G=(V,E)\) (with \(n\) nodes, \(m\) edges, edge costs \(a_{ij}\), capacities \([b_{ij}, c_{ij}]\), and node supplies \(s_i\)) and a dual prediction \(\hat{p}\), the pipeline proceeds as follows: preprocessing (shift so that \(\min_i \hat{p}_i=0\), then clip to \([0,(n-1)C]\)) → warm-start ε-relaxation to solve for the optimal flow. A cost-scaling technique is incorporated for multi-scale iterative convergence.
The core idea is that ε-relaxation maintains a pair \((x,p)\) satisfying ε-complementary slackness and iteratively reduces each node's surplus \(g_i\) to zero, yielding a feasible flow. The dual prediction provides a better initial \(p\), reducing the number of iterations.
Key Design 1: Warm-Start ε-Relaxation (Algorithm 1)¶
- Normalize the prediction via dual shift invariance (Fact 1).
- Clip to the valid range (Lemma 1: there exists an optimal dual solution in \([0,(n-1)C]\)).
- Core Lemma (Lemma 2): \(\beta(\hat{p}) \leq 2\|\hat{p}-p^*\|_\infty\), i.e., the upper bound on the reduced-cost length at warm-start is linearly related to the \(\ell_\infty\)-norm of the prediction error.
- Theorem 1 (vanilla version): complexity \(O(\min\{n^3\|\hat{p}-p^*\|_\infty, n^4C\})\); for 0/1-flow instances, \(O(\min\{mn\|\hat{p}-p^*\|_\infty, mn^2C\})\).
Key Design 2: Cost-Scaling Version (Algorithm 2)¶
- Edge costs are scaled by \((n+1)/c^T\); iterations proceed from coarse to fine over \(T=\log\|\hat{p}-p^*\|_\infty\) rounds.
- Each round of ε-relaxation costs \(O(n^3)\), as the solution from the previous round provides a good initialization.
- Theorem 2: \(O(\min\{n^3\log\|\hat{p}-p^*\|_\infty, n^3\log(nC)\})\).
- Robustness guarantee: degrades to the classical \(O(n^3\log(nC))\) bound when predictions are completely wrong.
Key Design 3: Two Prediction Learning Paradigms¶
- Fixed predictions (traffic networks): The network topology is fixed while edge weights fluctuate randomly. An expected dual solution is learned by minimizing \(\min_{\hat{p}} \frac{1}{k}\sum_i \log\|\hat{p}-p^{(i)}+\Delta_i\|_\infty\) (a convex optimization problem solved via Frank-Wolfe). PAC sample complexity is \(\tilde{O}(n/\varepsilon^2)\) (Theorem 3).
- Feature-based predictions (escape routing): A UNet-style CNN predicts dual solutions from 3-channel feature maps (obstacles, distance to pins, distance to boundary), trained with MSE loss. Sample complexity is \(O(n \cdot d_{NN} \cdot \log(nC)/\varepsilon^2)\) (Theorem 4).
Loss & Training¶
- Fixed predictions: minimize surrogate loss \(\sum_i \log\|\hat{p}-p^{(i)}+\Delta_i\|_\infty\) (convex, solved via Frank-Wolfe).
- CNN predictions: MSE loss (\(\ell_2\) norm), since \(\ell_\infty\) training is difficult. The UNet architecture contains 2 downsampling and 2 upsampling modules, each with two 3×3 convolutional layers + GroupNorm and residual connections. Adam optimizer, lr=1e-3, reduced by 10× at epoch 15, batch size 16.
- The pseudo-dimension bounds for the end-to-end algorithm (Lemma 3/4) are proven theoretically, guaranteeing generalization.
Key Experimental Results¶
Main Results 1: Traffic Networks (DIMACS Road Network Benchmarks)¶
| Instance | Nodes | ε-R (w/ pred.) | ε-R (w/o pred.) | Network Simplex | SSP |
|---|---|---|---|---|---|
| paths_04_NV | 261K | 1.50s | 24.25s | 19.29s | 11.68s |
| flow_04_NV | 261K | 18.21s | 112.93s | 20.22s | 51.21s |
| paths_05_WI | 519K | 3.88s | 58.84s | 61.41s | 73.49s |
| flow_06_FL | 1049K | 45.95s | 630.73s | 271.13s | 1769.58s |
| paths_07_TX | 2074K | 25.87s | 323.85s | 694.12s | 2113.63s |
| flow_07_TX | 2074K | 238.41s | 2247.67s | 1127.91s | very slow |
| Avg. Speedup | — | 1.00× | 12.74× | 12.08× | 38.19× |
Main Results 2: Chip Escape Routing¶
| Instance | Board Size | Pins | ε-R (w/ pred.) | ε-R (w/o pred.) | NS | SSP |
|---|---|---|---|---|---|---|
| 1 | 296×277 | 710 | 2.10s | 3.30s | 11.75s | 11.55s |
| 5 | 635×572 | 1366 | 45.24s | 70.18s | 453.92s | 280.03s |
| 10 | 1205×1374 | 1985 | 630.49s | 955.19s | very slow | 2592.41s |
| Avg. Speedup | — | 1.00× | 1.64× | 11.19× | 7.53× |
Key Findings¶
- Most significant speedup on traffic networks: Dual predictions accelerate ε-relaxation by 6.2–21.4× over the unpredicted version; on 0/1-flow instances (paths_*), it even surpasses Network Simplex.
- Strong cross-scale generalization of CNN: A single CNN model generalizes well as board sizes range from 300×300 to 1000×1000.
- Theory–experiment consistency: In synthetic experiments, runtime growth as a function of prediction error matches theoretical curves, and degrades to classical complexity under large errors.
- High parallelization potential: Road networks are planar graphs (4-colorable), enabling an additional theoretical speedup of \(n/4\).
- Cost-scaling step-by-step analysis: Per-step running times decrease progressively (e.g., paths_04 from 2.96s to <0.4s), indicating the theoretical analysis is conservative.
- Fixed-prediction learning is simple and efficient: Only 10 samples suffice to learn effective dual predictions (Gaussian-perturbed edge weights, solved via convex optimization).
- Fast CNN convergence: Training on only 80 instances for 40 epochs achieves convergence, validating the structural regularity of the dual solution space.
Highlights & Insights¶
- First learning-augmented algorithm for MCF with full theoretical guarantees (consistency + robustness + PAC sample complexity).
- Complexity tied to the \(\ell_\infty\)-norm (rather than the \(\ell_0\)-norm), so exact prediction of every component is unnecessary—more practical.
- Two learning paradigms (fixed predictions / CNN predictions) cover both fixed and varying topology settings.
- Validated on large-scale real road networks (2 million nodes) with 12.74× speedup.
- On 0/1-flow instances, prediction-augmented ε-relaxation outperforms Network Simplex—the most widely used MCF solver in practice.
- High theory–experiment agreement: runtime curves as a function of prediction error are consistent with theoretical bounds in synthetic experiments.
Limitations & Future Work¶
- Cost-scaling parameter \(T\) requires tuning: Though theoretically estimable, this incurs additional overhead; experiments set \(T=\hat{T}+2\) manually.
- CNN uses MSE rather than \(\ell_\infty\) loss: The gap between the surrogate loss and the theoretical bound is not quantified.
- Parallelization not implemented: Although ε-relaxation is naturally parallelizable (road networks are 4-colorable), no parallel comparison is conducted.
- Integer assumption only: Capacities and supplies are assumed to be integers; generalization to non-integer settings remains unexplored.
- Limited speedup on escape routing: Only 1.64×, possibly due to insufficient CNN prediction accuracy or structural differences in the problem.
- No comparison with latest SOTA: The nearly linear-time algorithm of Brand et al. (2023) is not discussed in terms of practical feasibility.
Related Work & Insights¶
- Classical MCF algorithms: Network Simplex (practical SOTA, unknown complexity), SSP (simple but theoretically slow), capacity/cost scaling (Edmonds-Karp 1972, Goldberg-Tarjan 1987), Brand et al. (2023) near-linear theoretical upper bound.
- Algorithms with predictions: Dinitz et al. (2021) matching + dual predictions, Davies et al. (2023) max-flow + primal predictions, Chen et al. (2022) 0/1-flow \(O(m^{3/2}n)\) bound (\(\ell_0\)-norm, impractical), Sakaue & Oki (2022) \(O(mn^2\|\hat{p}-p^*\|_\infty)\) (weaker bound).
- Data-driven algorithm design: Gupta & Roughgarden (2017) framework, Balcan et al. (2021–2024) generalization guarantees, Cheng et al. (2024) branch-and-bound.
Rating¶
⭐⭐⭐⭐ — Theoretically rigorous, with elegant consistency + robustness complexity bounds. Speedups are highly significant on traffic networks (12.74×) and moderate on escape routing (1.64×). The work extends the algorithms-with-predictions framework to MCF—a core optimization problem—combining theoretical depth with practical value. Note that while this paper is categorized under autonomous driving, it is more squarely in operations research / transportation optimization.