Skip to content

Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming

Conference: ICLR2026
arXiv: 2508.18742
Code: https://github.com/Liwow/Constraint_Matters
Area: Optimization/Theory
Keywords: MILP, constraint reduction, tight constraint, multi-modal representation, GNN

TL;DR

A MILP model simplification framework based on constraint reduction is proposed. It defines fixed constraint strength \(\rho\) and utilizes information gain \(\Delta H=-\log\rho\) to identify Critical Tight Constraints (CTC). A multi-modal GNN representation, merging instance-level bipartite graphs with abstract-level type graphs, is designed to predict CTCs. Across 4 large-scale benchmarks, solution quality (\(\text{gap}_\text{abs}\)) improved by an average of 51.06%, and convergence speed (PDI) accelerated by an average of 17.47%.

Background & Motivation

Background: Large-scale MILP problems are prevalent in industrial scenarios such as production scheduling, supply chain, and energy management. Exact solvers like Gurobi/SCIP are computationally expensive for large-scale instances. ML-based model reduction (predicting and fixing a subset of variables) is a mainstream acceleration method.

Limitations of Prior Work:

  • Variable reduction (Predict-and-Search, ConPaS, etc.) can lead to infeasible solutions if predictions are incorrect.
  • Constraint reduction (converting inequality constraints into equalities) has rarely been systematically studied.
  • Directly fixing all tight constraints yields unstable results: on CA_easy tasks, the best fixation takes only 1.85s vs. 465.74s for the worst (baseline 378.23s).

Key Challenge: Different tight constraints contribute vastly differently to solving acceleration—random fixation may actually slow down the process, necessitating a principled selection.

Key Insight: Starting from duality theory, constraints and variables are inherently coupled; fixing tight constraints is equivalent to reducing the feasible region. The key problem becomes: which constraints are most worth fixing? How can they be predicted efficiently?

Method

Overall Architecture

The framework addresses the bottleneck of slow large-scale MILP solving by fixing a subset of tight constraints (inequalities that hold as equalities at the optimal solution) to reduce the feasible region. This process is divided into three sequential steps: first, an information-theoretic rule is used to define "which types of constraints are worth fixing," labeling them as Critical Tight Constraints (CTC) for training. Second, a multi-modal graph network encodes MILP instances to capture both coefficient matrices and constraint semantic types, predicting and ranking the CTC score for each constraint. Finally, the top-\(k_c\) constraints are converted to equalities with a trust-region approach for fault tolerance and solved using Gurobi/SCIP. These steps answer "which to fix," "how to represent and predict," and "how to fix safely."

graph TD
    IN["MILP Instance<br/>Coefficient Matrix + Constraint Types"] --> ENC["Bipartite Graph Construction<br/>Instance Graph (Coefficients as Weights)<br/>+ Abstract Type Graph (PLM Text Embeddings)"]
    TCP["Critical Tight Constraint Identification TCP<br/>Strength ρ → Information Gain ΔH=-log ρ<br/>10 Prototype Constraints Selected as CTC"] -->|Provide Training Labels| GNN
    ENC --> GNN["Multi-Modal GNN Representation<br/>Intra-layer Propagation + Inter-layer<br/>Cross-Attention / Gated α Fusion"]
    GNN --> PRED["Predict and Rank<br/>CTC Score per Constraint"]
    PRED --> TR["Trust-Region Fixation<br/>top-k_c converted to equalities + Δc fault tolerance"]
    TR --> SOLVE["Gurobi / SCIP Solving<br/>Feasible Region Reduction / Convergence Acceleration"]

Key Designs

1. CTC Identification: Quantifying Constraint Value via Information Gain

Since fixing all tight constraints results in large performance fluctuations (1.85s to 465.74s), a principled ranking is required. This work uses an information-theory-inspired TCP (Tight Constraints Priority) module. It defines the fixed constraint strength \(\rho(C_i)=|S_{\hat{C}_i}|/|S_{C_i}|\), representing the reduction ratio of the local feasible region after fixing category \(C_i\), and converts it into "fixation gain" via information gain \(\Delta H_{C_i}=-\log\rho\). A smaller \(\rho\) indicates a more compressed feasible region and higher \(\Delta H\), contributing more to acceleration. Based on this, 10 prototypes are selected from 17 basic MIPLIB constraints to estimate strength: Set Packing constraints have \(\rho=n/(n+1)\to 1\) (low gain), while Knapsack and Bin Packing have \(\rho=O(1/A\sqrt{n})\) (high gain). Consequently, tight constraints in the latter categories are prioritized as CTCs.

2. Multi-modal GNN Representation: Semantic Types as a Second Modality

Relying solely on the coefficient matrix makes it difficult for networks to distinguish constraint types (e.g., Knapsack vs. Set Packing) and weakens cross-instance generalization. CTCs are strongly correlated with constraint categories. Thus, the method overlays two graphs: a lower-level standard instance bipartite graph (variables and constraints as nodes, coefficients as edge weights) and an upper-level abstract type bipartite graph where nodes represent variable/constraint categories. Initial features for categories are encoded from text descriptions using a Pre-trained Language Model (PLM). Intra-layer message passing (MP) occurs independently, followed by inter-layer MP fusion: abstract category nodes use Cross-Attention to query corresponding instance node features. Simultaneously, the mean instance feature \(\bar{h}_{V_j}\) and abstract feature \(\hat{h}_{V_j}\) are gated via \(\alpha=\text{Gate}(\bar{h}_{V_j}\,\|\,\hat{h}_{V_j})\in[0,1]\) to dynamically fuse information back into instance nodes.

3. Trust-Region Fixation: Buffering Prediction Errors

To prevent infeasibility caused by incorrect predictions, a trust-region parameter \(\Delta_c\) is introduced, allowing at most \(\Delta_c\) fixed constraints to violate the equality form. This creates a buffer between "full fixation" and "full relaxation." Since the optimal solution \(x^*\) already satisfies these tight constraints, fixing them (or penalizing violations) does not exclude \(x^*\). Even with prediction noise, \(\Delta_c\) ensures the reduced feasible region remains a non-empty subset of the original, likely containing \(x^*\).

Loss & Training

Focal Loss is used instead of Cross-Entropy to counter selective learning bias, where the network tends to predict simple, non-critical constraints with high confidence. Focal Loss down-weights easy samples using \((1-\hat{y})^\gamma\), forcing focus onto difficult and critical CTCs. The total loss combines variable and constraint predictions: \(\mathcal{L}=\lambda\mathcal{L}^{\text{sol}}_{\text{Focal}}+(1-\lambda)\mathcal{L}^{\text{con}}_{\text{Focal}}\).

Key Experimental Results

Main Results (800s limit, 100 test instances)

Method CA \(\text{gap}_\text{abs}\) CA PDI↓ MIS \(\text{gap}_\text{abs}\) MIS PDI↓ MVC \(\text{gap}_\text{abs}\) MVC PDI↓ WA \(\text{gap}_\text{abs}\) WA PDI↓
Gurobi 477.51 76.29 2.70 33.09 8.38 68.02 0.20 4.76
PS+Gurobi 210.19 73.78 0.04 0.69 0.28 7.39 0.07 4.97
ConPaS+Gurobi 197.36 73.75 0.02 0.71 0.10 7.64 0.10 4.97
Ours+Gurobi 104.72 73.02 0.00 0.47 0.00 6.26 0.06 4.40
Gain(%) 77.06% 4.22% 100% 98.58% 100% 90.80% 70.00% 7.56%
SCIP 4005.99 190.18 4.16 103.91 12.42 140.89 2.60 37.06
PS+SCIP 3575.18 179.48 0.66 6.49 2.98 21.03 1.27 34.85
ConPaS+SCIP 3545.92 146.88 0.43 5.52 2.56 24.05 1.20 29.91
Ours+SCIP 3401.63 109.41 0.34 4.65 1.48 18.34 0.83 21.52
Gain(%) 15.08% 42.47% 91.83% 95.52% 88.08% 86.98% 68.08% 41.93%

In Gurobi, the \(\text{gap}_\text{abs}\) for MIS and MVC dropped to 0 (perfectly finding the optimal solution); in SCIP, PDI improved by up to 42.47%.

Generalization & Real-world Scenarios

Method CA (Large) \(\text{gap}_\text{abs}\) CA PDI↓ MVC (Large) \(\text{gap}_\text{abs}\) MVC PDI↓ MMCN (Real) \(\text{gap}_\text{abs}\) MMCN PDI↓
SCIP 7009.58 198.78 22.07 228.21 13819.90 427.63
PS 6541.87 169.86 4.07 39.63 8084.36 329.60
ConPaS 6287.22 156.95 2.20 35.39 6064.85 330.05
Ours 5955.86 131.44 0.55 24.54 3547.08 246.89

On the real-world logistics network MMCN, the \(\text{gap}_\text{abs}\) decreased from 13819.90 to 3547.08 (↓74.3%).

Sensitivity to Constraint Fixation

Scenario CA_easy Time (s) WA Time (s)
Original Solve 378.23 >3600
Best Fixation 1.85 50.73
Worst Fixation 465.74 >3600

The 250x gap between best and worst fixation underscores that "which constraints are fixed" is critical.

Highlights & Insights

  • New Dimension of Constraint Reduction: One of the first systematic studies on constraint reduction in ML4Optimization, complementing variable reduction.
  • Informing Principled Design: \(\rho\) and \(\Delta H\) provide a theoretically grounded measure of constraint importance rather than relying on heuristics.
  • Novel Multi-modal Definition: Fusing the abstract model (semantic types) with instance modalities—a perspective applicable to other structured optimization problems.
  • Significant Speedup with 5% Fixation: Low prediction overhead with high returns.

Limitations & Future Work

  • The local decoupling assumption (constraint types independently affecting the feasible region) may not hold under high coupling.
  • CTC labeling relies on oracle optimal solutions, which are expensive to obtain.
  • Validated only on binary integer variables; not yet extended to general integer encoding.
  • Cannot be directly applied to standard benchmarks like MIPLIB that lack explicit mathematical descriptions.
  • vs. Variable Reduction (PS/ConPaS): This work opens a new direction in constraint reduction from a duality perspective, complementing existing methods.
  • vs. Bertsimas & Stellato (2022): The latter predicts all tight constraints, which is difficult; this work selects only a critical subset.
  • vs. Gasse et al. (2019) GNN: The addition of abstract type graphs and PLM embeddings improves cross-instance generalization.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Innovative combination of constraint reduction, information theory, and multi-modal representation.
  • Experimental Thoroughness: ⭐⭐⭐⭐ 4 datasets + real-world scenarios + generalization + ablation.
  • Writing Quality: ⭐⭐⭐⭐ Motivation-driven with clear theoretical analysis.
  • Value: ⭐⭐⭐⭐⭐ Opens a new path for constraint reduction in ML4Optimization.