Efficient Kernelized Learning in Polyhedral Games Beyond Full-Information: From Colonel Blotto to Congestion Games¶
Conference: NeurIPS 2025 arXiv: 2509.20919 Code: None Area: Other Keywords: polyhedral games, kernelization, coarse correlated equilibrium, Colonel Blotto, congestion games
TL;DR¶
This paper proposes a kernelization-based framework for designing computationally efficient no-regret learning algorithms for polyhedral games (Colonel Blotto, graphic matroid congestion games, and network congestion games) under partial-information feedback, significantly improving the runtime complexity for learning coarse correlated equilibria (CCE).
Background & Motivation¶
Polyhedral games are normal-form games with combinatorial structure and exponentially large action spaces. Each player's action is a \(d\)-dimensional binary vector (with at most \(m\) ones), and the cost function is linear over actions. Canonical examples include:
- Colonel Blotto games: distributing \(n\) soldiers across \(k\) battlefields, with \(N \approx n^k\) actions
- Graphic matroid congestion games: selecting a spanning tree as a strategy, with \(N \approx |E|^{|V|}\)
- Network congestion games: selecting an \(s\to t\) path, with \(N \approx |E|^K\)
Since \(N\) grows exponentially in \(m\), classical no-regret algorithms (MWU, FTRL, etc.) incur per-round complexity of \(\text{poly}(N)\), rendering them computationally infeasible in these games.
The full-information setting (observing the cost of every action each round) has been addressed via kernelization [Farina et al., 2022], but practical settings more commonly involve partial-information feedback: - Bandit feedback: only the cost of the chosen action is observed - Semi-bandit feedback: the cost of each component of the chosen action is observed
Existing methods under partial information suffer from prohibitive runtime complexity: - [ZL22]'s bandit algorithm for Colonel Blotto: \(\tilde{O}(n^{18}k^{11}/\varepsilon^2)\) - [LLWZ20]'s CCE learning: runtime dependent on \(d^{10}\)
Core Problem: Can kernelization be extended to partial-information settings to yield no-regret learning dynamics with optimal runtime complexity?
Method¶
Overall Architecture¶
The proposed kernelization framework addresses three main challenges: 1. Efficient computation of loss estimators for policy updates 2. Efficient sampling from the MWU distribution 3. High-probability no-swap-regret guarantees to ensure convergence to CCE
Theorem 2.1 (informal): If all \(|P|\) players employ no-regret learning, the time-averaged joint action \(\sigma^* = \frac{1}{T}\sum_t v_1^{(t)} \otimes \cdots \otimes v_{|P|}^{(t)}\) constitutes an approximate CCE.
Key Designs¶
1. Bandit Setting: Kernelized GeometricHedge
The key innovation lies in the observation that, under bandit feedback, constructing an unbiased combinatorial bandit estimator requires the second-order moment of the MWU distribution (rather than the first-order moment sufficient in the full-information setting).
Theorem 3.1: The required second-order moments can be efficiently computed via \(\Theta(d^2)\) kernel evaluations.
Regret bound: \(\tilde{O}(d^{2/3}m^{4/3}T^{2/3})\) (high probability), improving over baselines in dependence on game parameters.
2. Semi-Bandit Setting: Implicit Exploration Loss Estimator
The implicit exploration loss estimator of Neu [2015] is adopted and shown to be kernel-compatible with the first-order moments of MWU.
Regret bound: \(\tilde{O}(m\sqrt{Td})\) (high probability).
3. Efficient Sampling
A general kernelized sampling procedure is proposed, requiring only an additional \(\Theta(d)\) kernel evaluations. The core idea is to exploit the combinatorial structure of the games (e.g., generating functions, Matrix-Tree Theorem) to reduce operations over the exponentially large action space to polynomial-time kernel computations.
Game-Specific Kernelizations¶
Colonel Blotto games: - Kernelization via generating functions - Operations performed directly on the \(\Theta(nk)\) representation, avoiding the \(O(n^2k)\) overhead of DAG-based representations - Bandit runtime: \(\tilde{O}(n^{2+\omega}k^{6+\omega}/\varepsilon^3)\), where \(\omega \approx 2.372\) is the matrix multiplication exponent
Graphic matroid congestion games: - Kernel designed via the Matrix-Tree Theorem - Amortized kernel computation time reduced via rank-1 updates to the LU decomposition of the Laplacian matrix - Incremental kernelization enables exact sampling
Network congestion games: - First efficient implementation of GeometricHedge for general network congestion games beyond shortest-path problems
Loss & Training¶
This is a theoretical work; the core "training strategy" consists of online learning dynamics: - Each round, an action is sampled from the MWU distribution - After observing feedback, a loss estimator is constructed - MWU distribution weights are updated - After \(T\) rounds, the time-averaged strategy converges to an \(\varepsilon\)-CCE
Key Experimental Results¶
Main Results¶
This is a purely theoretical paper; results are presented as runtime complexity comparison tables.
Table 1: Runtime Comparison for Colonel Blotto Games
| Algorithm | Runtime for \(\varepsilon\)-CCE | Representation | Feedback |
|---|---|---|---|
| [BHK+23] | \(\tilde{O}(nk^4/\varepsilon^2)\) | \(O(k\log n)\) | Full-info |
| Ours | \(\tilde{O}(\|P\|nk^3/\varepsilon)\) | \(O(nk)\) | Full-info |
| [ZL22] | \(\tilde{O}(n^{18}k^{11}/\varepsilon^2)\) | \(O(n^2k)\) | Bandit |
| [LE21] | \(\tilde{O}(n^4k^5/\varepsilon^3 \cdot \max\{1/\lambda_{\min}, n^2\}^{3/2})\) | \(O(n^2k)\) | Bandit |
| Ours | \(\tilde{O}(n^{2+\omega}k^{6+\omega}/\varepsilon^3)\) | \(O(nk)\) | Bandit |
| Ours | \(\tilde{O}(n^2k^4/\varepsilon^2)\) | \(O(nk)\) | Semi-bandit |
Comparison with [ZL22]: In the bandit setting, the dependence on \(n\) and \(k\) improves from \(n^{18}k^{11}\) to approximately \(n^{4.4}k^{8.4}\), yielding an improvement factor of approximately \(n^{13}k^2\).
Table 2: Runtime Comparison for Graphic Matroid Congestion Games
| Algorithm | Runtime for \(\varepsilon\)-CCE | Feedback |
|---|---|---|
| [ZL22] | \(\tilde{O}(\|V\|^{29}/\varepsilon^2)\) | Bandit |
| Ours | \(\tilde{O}(\|E\|^3\|V\|^6(\|V\|^{\omega-1}+\|E\|)/\varepsilon^3)\) | Bandit |
| Ours | \(\tilde{O}(\|E\|^2\|V\|^{2+\omega}/\varepsilon^2)\) | Semi-bandit |
Table 3: Runtime Comparison for Network Congestion Games
| Algorithm | Runtime for \(\varepsilon\)-CCE | Feedback |
|---|---|---|
| [ZL22] | \(\tilde{O}(\|E\|^9K^{10}/\varepsilon^2)\) | Bandit |
| Ours | \(\tilde{O}(\|E\|^{2+\omega}K^4/\varepsilon^3)\) | Bandit |
| [GLLO07a] | \(\tilde{O}(\|E\|^{1+\omega}K^3/\varepsilon^2)\) | Semi-bandit |
| Ours | \(\tilde{O}(\|E\|^{1+\omega}K^2/\varepsilon^2)\) | Semi-bandit |
Ablation Study¶
There are no ablation experiments in the conventional sense; instead, the contribution of each component is demonstrated as follows: - Kernelized vs. non-kernelized: kernelization reduces per-round complexity from \(\text{poly}(N)\) to \(\text{poly}(d,m)\) - Second-order vs. first-order moment kernel: the bandit setting requires computing second-order moments of MWU, incurring \(\Theta(d)\) additional kernel evaluations compared to the full-information setting - Direct Blotto representation vs. DAG representation: operating directly on the \(\Theta(nk)\) representation yields a more compact alternative to the \(\Theta(n^2k)\) DAG representation used in [ZL22]
Key Findings¶
- Kernelization can be successfully extended to partial-information settings, directly answering the paper's core question.
- Runtime improvements are substantial: approximately \(n^{13}k^2\) improvement over [ZL22] in Colonel Blotto games; a reduction from \(|V|^{29}\) to polynomial complexity in graphic matroid games.
- Multiple open problems resolved: the paper answers the open question of [BHK+23] on \(1/\varepsilon\) convergence for Colonel Blotto under full information, and the open question of [CBL12] on efficiently implementing GeometricHedge over spanning trees.
Highlights & Insights¶
- Generality of the unified framework: the same kernelization paradigm applies to three structurally distinct classes of games—Colonel Blotto, graphic matroid congestion games, and network congestion games.
- Second-order moment kernelization is the core technical contribution: while full-information kernelization requires only first-order moments, this work identifies that the bandit setting requires second-order moments and proves they can be computed efficiently via kernelization.
- Compact representations are critical: operating directly on the game's geometric structure (the \(nk\) representation) rather than an indirect DAG representation is a key source of the runtime improvements.
- Resolution of multiple open problems: significantly amplifies the theoretical impact of the work.
Limitations & Future Work¶
- The regret bound in the bandit setting is \(T^{2/3}\), falling short of the optimal \(\sqrt{T}\) (which is achieved in the semi-bandit setting).
- The runtime dependence on \(\varepsilon\) is \(1/\varepsilon^3\) (bandit); whether this can be improved to \(1/\varepsilon^2\) remains an open problem.
- As a purely theoretical work, numerical experiments are absent.
- The framework requires cost functions to be linear in the actions, and may not apply to games with nonlinear cost functions.
- The practical efficiency of kernel computations depends on the specific game structure, limiting the degree of generalization.
Related Work & Insights¶
- [Farina et al., 2022]: Kernelized MWU framework for the full-information setting; the direct precursor of this work.
- GeometricHedge/ComBand [Dani et al., 2007]: Classical bandit linear optimization algorithms; the target of kernelization in this paper.
- [Beaglehole et al., 2023]: Fast approximate sampling for Colonel Blotto under full information, but incompatible with optimism-based methods.
- [Zimmert & Lattimore, 2022]: Continuous-action-space algorithms applied to polyhedral games, with impractical runtimes.
- Matrix-Tree Theorem [Tutte, 2001]: Mathematical foundation for kernelization in graphic matroid congestion games.
Rating¶
- Novelty: ★★★★★ — First extension of kernelization to partial-information settings in polyhedral games, resolving multiple open problems.
- Technical Depth: ★★★★★ — Deep integration of online learning, combinatorial optimization, and matrix theory.
- Experimental Thoroughness: ★★☆☆☆ — Purely theoretical; no numerical experiments.
- Writing Quality: ★★★★☆ — Theoretical exposition is clear; tabular comparisons are easy to parse; mathematical notation is dense.
- Value: ★★★☆☆ — Major theoretical contribution; empirical validation in practical game-theoretic settings remains to be conducted.