Contextual Dynamic Pricing with Heterogeneous Buyers¶
Conference: NeurIPS 2025 arXiv: 2512.09513 Area: Others Keywords: Dynamic Pricing, Heterogeneous Buyers, Contextual Bandits, Posterior Sampling, Disagreement Coefficient, Lipschitz Bandits
TL;DR¶
This paper presents the first systematic study of contextual dynamic pricing with heterogeneous buyers of \(K_\star\) unknown types. It proposes an Optimistic Posterior Sampling (OPS)-based algorithm achieving an \(\tilde{O}(K_\star\sqrt{dT})\) regret bound (optimal in \(d\) and \(T\)), and further introduces ZoomV—a variance-aware adaptive discretization algorithm—achieving the optimal \(\tilde{O}(\sqrt{K_\star T})\) regret in the non-contextual setting.
Background & Motivation¶
Background: In online contextual pricing, a seller sets price \(p_t\) for a product based on a \(d\)-dimensional feature vector \(u_t\), and a buyer purchases if their valuation \(v_t = \langle \theta^*, u_t \rangle\) meets or exceeds \(p_t\). Existing algorithms—based on multidimensional binary search or regression—uniformly assume homogeneous buyers sharing a single type \(\theta^*\).
Limitations of Prior Work: (a) Binary search methods cannot handle type-varying feedback, as knowledge sets are corrupted by contradictory observations; (b) regression methods assume i.i.d. noise, whereas the randomness induced by heterogeneous types is context-dependent; (c) the regret of naively discretized EXP4 explodes due to the infinite action space.
Key Challenge: The buyer type \(\theta_t\) is unobservable, making it impossible to associate feedback with the uncertainty set of a specific type. Moreover, the revenue function under pricing is discontinuous—a price increase of \(\epsilon\) may cause the buyer to abandon the purchase entirely.
Key Insight: The problem is formulated as learning the type distribution \(D_\star \in \Delta(\Theta)\), exploiting the disagreement coefficient within the OPS framework to carefully control the exploration–exploitation tradeoff.
Method¶
Problem Setup¶
At each round \(t\), the buyer type \(\theta_t \sim D_\star\) with \(|{\rm supp}(D_\star)| = K_\star\) unknown, valuation \(v_t = \langle u_t, \theta_t \rangle\), and purchase decision \(y_t = \mathbb{1}\{v_t \geq p_t\}\). The seller's objective is to minimize pricing regret:
where \(\mathsf{rev}_Q(p) = p \cdot \mathsf{dem}_Q(p)\) is the expected revenue and \(\mathsf{br}_Q\) is the optimal pricing policy.
OPS for Finite Model Classes (Algorithm 1)¶
A posterior \(\mu_t \in \Delta(\mathcal{D})\) over models is maintained. At each round: 1. Sample model \(D_t \sim \mu_t\) and set price \(p_t = \mathsf{br}_{D_t}(u_t)\). 2. Observe \(y_t\) and update the posterior via a composite loss:
The model mismatch term penalizes deviation between predictions and observations, while the optimistic bias encourages exploration of high-potential models.
Controlling the Disagreement Coefficient (Core Technical Contribution)¶
The disagreement coefficient is defined as:
Key Insight: For a fixed context \(u\), the aggregate demand function \(\mathsf{dem}_\star(\cdot, u)\) has at most \(K_\star\) jump points, so \(\mathsf{dem}_D(\cdot, u) - \mathsf{dem}_\star(\cdot, u)\) is monotonically decreasing on each of \(K_\star + 1\) intervals. By: - Lemma 3.4: \(\mathsf{dis} \leq 2\) for the class of monotone decreasing functions; - Lemma 3.5: the disagreement coefficient of an \(N\)-composite function class is amplified by at most a factor of \(N\),
one obtains \(\mathsf{dis}(\mathcal{D}, D_\star) \leq 2(K_\star + 1)\). Combined with the OPS regret bound \(\tilde{O}(\lambda \cdot \mathsf{dis} \cdot T + \log|\mathcal{D}|/\lambda)\), Theorem 3.2 follows.
Perturbed OPS for the General Setting (POPS, Algorithm 2)¶
To handle infinite model classes and unknown \(K_\star\): 1. Smoothed demand: \(\mathsf{dem}_Q^\varepsilon(p) = \mathbb{E}_{\delta \sim U[0,\varepsilon]}[\mathsf{dem}_Q(p - \delta)]\), which smooths discrete jumps via random perturbation. 2. Price discretization with conservative perturbation: \(p_t = \max\{\hat{p}_t - \delta_t, 0\}\) with \(\delta_t \sim U[0,\varepsilon]\), exploiting the one-sided Lipschitz property of revenue (a price increase yields at most \(p' - p\) additional revenue) to bound the perturbation cost. 3. Trajectory coupling: It is shown that \(D_\star\) has a Lévy-metric neighbor within a finite cover \(\mathcal{D}\), ensuring the purchase sequences under POPS agree with high probability. 4. Non-uniform prior for unknown \(K_\star\): Models with larger support sizes are assigned smaller prior weights.
ZoomV for the Non-Contextual Setting (Algorithm 4)¶
When \(d = 1\), variance-aware confidence intervals are combined with adaptive discretization: - Standard zooming dimension \(\text{ZoomDim}(c) \leq 1\) yields \(\tilde{O}(T^{2/3})\). - Variance-aware zooming dimension: using \(\sigma^2(p) = p^2 \mathsf{dem}_\star(p)(1 - \mathsf{dem}_\star(p))\), confidence intervals become extremely narrow in high-price regions where demand is near zero for low-demand types. - It is proven that \(\text{ZoomDimV}(10K_\star) = 0\), yielding \(\tilde{O}(\sqrt{K_\star T})\).
Key Experimental Results¶
Summary of Theoretical Regret Bounds¶
| Setting | Algorithm | Upper Bound | Lower Bound |
|---|---|---|---|
| Contextual (\(d>1\), \(K_\star\) known) | OPS | \(\tilde{O}(K_\star\sqrt{dT})\) | — |
| Contextual (\(d>1\), \(K_\star\) unknown) | POPS | \(\tilde{O}(K_\star\sqrt{dT})\) | \(\Omega(\sqrt{K_\star dT})\) |
| Non-contextual (\(d=1\)) | ZoomV | \(\tilde{O}(\sqrt{K_\star T})\) | \(\Omega(\sqrt{K_\star T})\) |
| Type-observable (identifier) | Alg 5 | \(\tilde{O}(K_\star\sqrt{dT})\) | — |
| Type-observable (full vector) | Interpolation estimator | \(\tilde{O}(\sqrt{\min\{K_\star, d\}T})\) | — |
Key Findings¶
- The contextual regret \(\tilde{O}(K_\star\sqrt{dT})\) is optimal in \(d\) and \(T\) (up to a \(\sqrt{K_\star}\) factor).
- The non-contextual regret \(\tilde{O}(\sqrt{K_\star T})\) is fully optimal, matching the lower bound.
- When types are fully observable, regret drops to \(\tilde{O}(\sqrt{\min\{K_\star, d\}T})\), quantifying the substantial benefit of richer feedback.
- The computational complexity of POPS scales with the model class size (exponentially), constituting a theoretical bottleneck.
Highlights & Insights¶
- The composite disagreement coefficient decomposition is the most elegant technical contribution: it reduces the exploration complexity over an infinite action space to the finite type count \(K_\star\).
- The one-sided Lipschitz property is systematically exploited throughout: since raising prices only reduces demand, the regret cost of conservative pricing is bounded.
- ZoomV's variance-aware dimension reveals a special structure of the pricing problem: high-price intervals, though numerous, have near-zero variance and thus low regret contribution.
- The problem formulation is clean, providing an elegant connection between heterogeneous buyer pricing and contextual bandits.
Limitations & Future Work¶
- The runtime of POPS scales with the size of the discretized model class, making it computationally infeasible for large \(K_\star\) and \(d\).
- A gap remains in the contextual setting regarding the optimal dependence on \(K_\star\) (upper bound \(K_\star\) vs. lower bound \(\sqrt{K_\star}\)).
- Noise robustness is limited: when the Lévy distance from \(D_\star\) to the nearest \(K_\star\)-support distribution is \(\delta\), regret incurs an additional \(\sqrt{\delta} T^2\) term.
- Only linear valuation models are considered; extensions to nonlinear valuation structures are not discussed.
- The paper is purely theoretical with no empirical validation.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ First systematic treatment of heterogeneous contextual pricing, with multiple novel technical tools.
- Theoretical Depth: ⭐⭐⭐⭐⭐ Upper and lower bound analyses across a hierarchy of methods (OPS/POPS/ZoomV).
- Experimental Thoroughness: ⭐⭐ Purely theoretical; no experiments.
- Writing Quality: ⭐⭐⭐⭐ Clear structure with progressive development; notation is heavy but consistent.
- Overall: ⭐⭐⭐⭐ An important theoretical breakthrough with broad implications for online learning and pricing.
Related Work & Insights¶
- vs. Cohen et al. (2020) / Lobel et al. (2018): Classical multidimensional binary search methods, restricted to homogeneous buyers (\(K_\star = 1\)). Contradictory feedback from heterogeneous types corrupts the knowledge set.
- vs. Javanmard & Nazerzadeh (2019): Regression-based approach assuming i.i.d. noise, which cannot accommodate the context-dependent randomness introduced by heterogeneous types.
- vs. Cesa-Bianchi et al. (2019): A pioneering work on non-contextual heterogeneous pricing, but its regret depends on an instance parameter \(V\) that can blow up in the worst case. ZoomV eliminates this dependency via variance awareness.
- vs. Krishnamurthy et al. (2023): Corruption-robust contextual search tolerating \(C = o(T)\) corrupted responses, which does not apply to genuinely large-scale heterogeneous populations.
- The composite disagreement coefficient decomposition is transferable to other structured contextual bandit problems, such as auction design with piecewise linear revenue or heterogeneous user modeling in recommendation systems.