Skip to content

Learning in Structured Stackelberg Games

Conference: ICML 2026
arXiv: 2504.09006
Code: Unreleased
Area: Multi-agent / Game Learning / Learning Theory
Keywords: Stackelberg game, online learning, Littlestone dimension, PAC learning, AI safety

TL;DR

This paper introduces a structural assumption to "contextual Stackelberg games" (where the context → follower type mapping comes from a hypothesis class \(\mathcal{H}\)) and constructs two novel learning-theoretic dimensions: the Stackelberg-Littlestone dimension (SLdim) for online regret bounds and the \(\gamma\)-SG / \(\gamma\)-SN dimensions for PAC sample complexity bounds. It proves these dimensions strictly outperform various Littlestone / Natarajan dimensions and provides instance-optimal online algorithm SSOA and distributed algorithm \(\mathfrak L^*\).

Background & Motivation

Background: Stackelberg games are classical frameworks for "commitment-response" strategic interactions, where the leader commits to a strategy, and the follower best-responds after seeing it. These are widely used in security patrolling, congestion pricing, and AI red-teaming. Harris et al. (2024) extended this to a contextual form, where both leader and follower utilities are influenced by side information \(\mathbf z\).

Limitations of Prior Work: The work by Harris et al. provided a negative conclusion: when the context sequence and follower types are both chosen adversarially, the worst-case regret grows linearly with time \(T\). This is essentially because the problem can be reduced to online classification, where an adversary can encode any hard-to-learn "context → type" mapping. In other words, "online no-regret learning" is impossible in the most general contextual Stackelberg model.

Key Challenge: In reality, the context → follower type mapping is almost always structured—for example, park cameras can predict poaching types, or an AI deployment environment can predict attack types. However, tools to formalize this structure and "apply" it to learnability are missing. Furthermore, forcing the problem into a multiclass classification framework (using Littlestone dimension to measure difficulty) ignores the utility space structure of Stackelberg games: often, even if the classifier predicts incorrectly, the leader's strategy can still be optimal.

Goal: (1) Formalize "structured Stackelberg games" where the context → type mapping belongs to a known hypothesis class \(\mathcal H\); (2) Find new dimensions that simultaneously characterize the utility space and the hypothesis class complexity; (3) Provide instance-optimal algorithms and matching sample/regret bounds for both online and distributed learning paradigms.

Key Insight: Preserve the utility structure of the game (i.e., the leader's piecewise-linear payoff surface) and utilize the classical online learning tools "shattered tree / shattered set." However, within the node weights, the regressor is replaced with the Stackelberg regret itself. This allows the dimension to naturally distinguish between "correct classification but wrong strategy" and "wrong classification but still correct strategy."

Core Idea: Online learning is feasible if and only if the Stackelberg-Littlestone dimension is finite; distributed learning is feasible if and only if the \(\gamma\)-SN dimension is finite. Both can be strictly smaller than their corresponding classical dimensions.

Method

Overall Architecture

The authors fully adopt the notation of contextual Stackelberg games: \(\mathbf z \in \mathcal Z\) is the context, the leader commits to a mixed strategy \(\mathbf x\) over \(\Delta(\mathcal A)\), and the follower best-responds \(b_f(\mathbf z, \mathbf x)\) based on one of \(K\) types \(\{f^{(1)}, \dots, f^{(K)}\}\). The new structural assumption is: there exists an unknown but true mapping \(h^* \in \mathcal H \subseteq [K]^{\mathcal Z}\) such that \(h^*(\mathbf z_t) = f_t\) for all \(t\) (realizable setting). The leader's instantaneous loss is defined as Stackelberg regret \(r(\mathbf z, \hat{\mathbf x}, f^{(h^*(\mathbf z))}) = \sup_{\mathbf x} u(\mathbf z, \mathbf x, b(\mathbf z, \mathbf x)) - u(\mathbf z, \hat{\mathbf x}, b(\mathbf z, \hat{\mathbf x}))\). The entire work focuses on finding complexity measures that match this regret.

Key Designs

  1. Stackelberg-Littlestone (SL) Dimension (Online Setting):

    • Function: Characterizes the instance-optimal online regret. It is proved to be strictly smaller than the multiclass Littlestone dimension, providing non-trivial guarantees for instances that are "hard to classify but easy to play."
    • Mechanism: Defines an SL tree where each internal node carries a context \(\mathbf z_s\), each edge carries a type label \(j \in [K]\), and nodes are assigned recursive weights \(\rho_s = \inf_{\mathbf x \in \Delta(\mathcal A)} \max_{j: sj \in S_d} (r(\mathbf z_s, \mathbf x, f^{(j)}) + \rho_{sj})\), with leaf nodes \(\rho_s = 0\). A tree is shattered by \(\mathcal H\) if every root-to-leaf path corresponds to some \(h \in \mathcal H\). The SL dimension is the supremum of the root weights of all shattered SL trees: \(\mathrm{SLdim}_{\mathcal G}(\mathcal H) := \sup\{\kappa : \exists \text{ shattered SL tree with root weight } \kappa\}\).
    • Design Motivation: The classical multiclass Littlestone dimension is blind to the utility space—it only cares about distinguishing classes. However, the true cost in Stackelberg games is the utility loss, not the classification error. Embedding \(r(\cdot)\) into the tree weights allows the dimension to measure how much utility the leader loses due to lack of knowledge.
  2. SSOA: Instance-Optimal Online Algorithm:

    • Function: Constructively proves that the SL dimension provides matching upper and lower bounds for online regret.
    • Mechanism: Maintains a version space \(V_t \subseteq \mathcal H\) consistent with history. After seeing \(\mathbf z_t\), for each possible type \(j \in V_t(\mathbf z_t)\), it calculates the leader's optimal utility if that follower type were true: \(u_*^{(j)} = \sup_{\mathbf x} u(\mathbf z_t, \mathbf x, b_{f^{(j)}}(\mathbf z_t, \mathbf x))\). It then selects \(\mathbf x_t \in \arg\inf_{\mathbf x} \max_{j \in V_t(\mathbf z_t)} (u_*^{(j)} - u(\mathbf z_t, \mathbf x, b_{f^{(j)}}(\mathbf z_t, \mathbf x)) + \mathrm{SLdim}_{\mathcal G}(V_t^{(\mathbf z_t \to j)}))\).
    • Design Motivation: Mentally similar to the Standard Optimal Algorithm (SOA) in online multiclass classification, but the loss is changed from "misclassification" to Stackelberg regret. This aligns the decision toward "which action minimizes the worst-case remaining regret" rather than "which action correctly classifies the label."
  3. \(\gamma\)-SN / \(\gamma\)-SG Dimensions (Distributed PAC Setting):

    • Function: Provide lower and upper bounds for PAC sample complexity, enabling quantitative characterization of "distributed learnability."
    • Mechanism: \(\gamma\)-SN-shattering an \(n\)-element set \(\{\mathbf z_i\}\) requires two functions \(g_0, g_1\) such that (i) for each \(\mathbf z_i\), no mixed strategy allows the leader to incur \(\le \gamma\) regret for both \(g_0\) and \(g_1\), and (ii) any bit pattern \(b \in \{0, 1\}^n\) is realizable by \(\mathcal H\). The \(\gamma\)-SG-shatter is similar but uses a single baseline \(g\). Algorithm \(\mathfrak L^*\) keeps the perfectly consistent subclass \(\mathcal H|_S\) and performs minimax \(\mathbf x^* = \inf_{\mathbf x} \max_{i \in F} r(\mathbf z, \mathbf x, f^{(i)})\) on the prediction candidate set \(F\) for new contexts.
    • Design Motivation: Applying Natarajan/Graph dimensions directly in utility space fails to capture "harmless disagreement" where different hypotheses lead to the same optimal strategy. Introducing the \(\gamma\) threshold ensures that only "expensive disagreements" contribute to the dimension.

Key Experimental Results

As this is a theoretical paper, results are presented as theorems and constructive counterexamples.

Main Results Comparison

Setting Dimension Controlling Regret/Samples Relation to Classical Dimension Algorithm
Online Regret (Upper, Thm 3.9) \(\mathrm{SLdim}_{\mathcal G}(\mathcal H)\) \(\mathrm{SLdim}_{\mathcal G}(\mathcal H) \le \mathrm{Ldim}(\mathcal H)\) SSOA (Alg. 1)
Online Regret (Lower, Thm 3.8) \(\mathrm{SLdim}_{\mathcal G}(\mathcal H) - \epsilon\) No deterministic algorithm can do better Adversarial construction
PAC Sample Lower (Thm 4.4) \(\Omega\left(\frac{\mathrm{SNdim}^{(\gamma)} + \log(1/\delta)}{\epsilon}\right)\) Adapted from Natarajan with \(\gamma\) threshold
PAC Sample Upper (Thm 4.7) Controlled by \(\mathrm{SGdim}^{(\gamma)}_{\mathcal G}(\mathcal H)\) Corresponds to Graph dim with utility cut-off \(\mathfrak L^*\) (Alg. 2)

Strict Separation from Classical Dimensions

Example \(\mathrm{Ldim}(\mathcal H)\) \(\mathrm{SLdim}_{\mathcal G}(\mathcal H)\) Explanation
Thm 3.5 Construction \(\infty\) \(0\) Two followers induce the same optimal strategy near the threshold; classification is infinitely hard but strategy has zero regret.
Example 3 (\(n\) classes + permutation class + \(U = \mathbf I\)) \(n - 1\) \(n - H_n\) (\(H_n\) is harmonic number) SLdim is smaller by a harmonic factor; the gap diverges with \(n\).
Thm 3.11 Large Small On this instance, SOA incurs continuous utility loss, whereas SSOA does not.

Key Findings

  • The true necessary and sufficient condition for online learnability is not the finite complexity of the classifier, but the finiteness of the utility-aware SL dimension. Many problems that are infinitely hard to classify are trivial in the Stackelberg context.
  • In distributed settings, "game-aware" Natarajan/Graph dimensions with \(\gamma\) thresholds are required to match upper and lower bounds; classical dimensions without \(\gamma\) overestimate difficulty.
  • Applying the classical SOA directly is not optimal. The authors construct instances where SOA's cumulative regret is strictly higher than SSOA's, showing that the natural baseline of "predict type then find strategy" is suboptimal.
  • The analytical solution \(\mathrm{SLdim} = n - H_n\) in Example 3 shows the gap between "classification difficulty" and "game difficulty" is at least an \(\Omega(\log n)\) multiplicative factor that does not vanish as \(n\) increases.
  • The framework extends to bid learning in auctions and Bayesian persuasion, demonstrating that the SL dimension is a unified characterization for commitment-then-response frameworks.

Highlights & Insights

  • Proposes a clean paradigm for utility-aware online learning dimensions: replacing the 0/1 mistake weights in shattered trees/sets with task-specific losses (Stackelberg regret here). This is applicable to strategic classification, Bayesian persuasion, and more.
  • Utilizes counterexamples like "different labels but same strategy" to demonstrate the "overestimation" of classical Littlestone/Natarajan dimensions.
  • The SSOA algorithm only adds an \(\mathrm{SLdim}_{\mathcal G}(V_t^{(\mathbf z_t \to j)})\) term compared to SOA, providing a clear dual structure that is easy to extend from existing SOA implementations.

Limitations & Future Work

  • Algorithms SSOA and \(\mathfrak L^*\) require iterating or optimizing over the version space every round, which is computationally expensive for large hypothesis classes.
  • The work assumes a realizable setting (\(h^* \in \mathcal H\)); whether the SL dimension characterizes the agnostic setting remains open.
  • Lower bounds are limited to deterministic algorithms and PAC cut-off losses; optimality for randomized algorithms or expected-utility losses is still an open question.
  • A gap exists between \(\gamma\)-SN and \(\gamma\)-SG dimensions; instance-optimal distributed sample complexity is not yet closed.
  • vs Harris et al. 2024: They proved \(\Theta(T)\) regret for two-sided adversaries; this paper bypasses learnability issues via the structural assumption \(h^* \in \mathcal H\). SSOA reduces to simpler Hedge-style solutions in their i.i.d.-context setting.
  • vs Balcan et al. 2015 / Harris et al. 2023: These used Hedge over mixed strategy sets for finite follower types; this paper removes the i.i.d. follower or independent context assumptions, replacing them with structural assumptions and SL dimension.
  • vs Ahmadi et al. 2024 (Strategic Littlestone): Both extend Littlestone-like dimensions to commitment games, but they focus on strategic classification (follower modifies features) while this paper focuses on Stackelberg games (follower chooses actions).
  • vs Attias et al. 2023: This paper borrows the idea of "shattered sets with \(\gamma\) thresholds" from real-valued regression but redesigns the minimax cut-off for the mixed structure of discrete follower types and continuous utility functions.

Rating

  • Novelty: TBD
  • Experimental Thoroughness: TBD
  • Writing Quality: TBD
  • Value: TBD