Preference Estimation via Opponent Modeling in Multi-Agent Negotiation¶
Conference: ACL 2026 arXiv: 2604.15687 Code: None Area: Video Understanding Keywords: Opponent Modeling, Bayesian Inference, Preference Estimation, Multi-Party Negotiation, LLM Language Signals
TL;DR¶
This paper proposes a preference estimation method that integrates LLM-extracted natural language preference signals into a Bayesian opponent modeling framework. In multi-party, multi-issue negotiations, it fuses qualitative cues and quantitative bid information via a linguistic likelihood function, improving the full agreement rate (FAR) from 37% to 62%.
Background & Motivation¶
State of the Field: Automated negotiation in multi-party, multi-issue settings relies heavily on accurate opponent modeling. Traditional approaches adopt the BOA architecture and estimate opponent utility functions from numerical bid histories via Bayesian learning.
Limitations of Prior Work: (1) Purely numerical methods fail to capture qualitative preference information expressed in natural language dialogue, leading to information incompleteness; (2) Although LLMs can understand semantics, using them directly for preference inference lacks strategic consistency and becomes unstable in prolonged negotiations; (3) Reasoning complexity for LLMs grows exponentially as the amount of information increases.
Root Cause: Rich qualitative information in language (e.g., "Issue A matters more to me") cannot be exploited by conventional numerical models, while LLMs lack a structured belief update mechanism.
Paper Goals: Design a preference estimation method that integrates linguistic signals into a structured Bayesian framework, combining semantic understanding with probabilistic inference.
Starting Point: Use an LLM to extract structured preference signals (target issue/option + stance) from utterances, then convert them into probabilistic likelihood functions via the Luce choice axiom, and fuse them with bid likelihoods for Bayesian updates.
Core Idea: Linguistic likelihood × bid likelihood → Bayesian posterior update, unifying qualitative and quantitative information within a probabilistic framework.
Method¶
Overall Architecture¶
At each negotiation round, an agent receives the opponent's bid \(d_t\) and utterance \(u_t\). An LLM parses the utterance to obtain a preference signal \(z_t\). The bid likelihood \(P(d_t|h_k)\) and linguistic likelihood \(P(z_t|h_k)\) are computed separately and fused via a naive Bayes assumption to update the hypothesis posterior \(P(h_k|d_t, z_t)\).
Key Designs¶
-
Linguistic Preference Signal Extraction:
- Function: Converts natural language utterances into structured preference signals.
- Mechanism: An LLM parses utterance \(u_t\) into signal \(z_t\) with two attributes: Target (a single issue/option or a comparison between issues/options) and Stance (preference, opposition, etc.).
- Design Motivation: Provides structured input for probabilistic computation, avoiding the unreliability of having LLMs directly output numerical estimates.
-
Linguistic Likelihood via the Luce Choice Axiom:
- Function: Converts structured signals into a probability distribution over the hypothesis space.
- Mechanism: For a signal expressing "preference for issue \(i_x\)," the likelihood is \(P(z_t|h_k) = w_x^{(k)} / \sum_m w_m^{(k)}\), i.e., the proportion of that issue's weight relative to the total weight. Comparative and opposition signals are handled analogously.
- Design Motivation: The Luce axiom is a classical probabilistic model in choice theory that naturally maps weights/evaluations to probabilities.
-
Multimodal Bayesian Fusion:
- Function: Jointly updates the posterior belief over opponent preferences.
- Mechanism: Assuming conditional independence between bids and linguistic signals, the posterior is \(P(h_k|d_t, z_t) \propto P(d_t|h_k) \cdot P(z_t|h_k) \cdot P(h_k)\).
- Design Motivation: Although the naive Bayes assumption is a simplification, it makes computation tractable, and bids and language do provide complementary information.
Loss & Training¶
No model training is involved. GPT-4.1 is used as the underlying LLM. Bayesian updates are performed online.
Key Experimental Results¶
Main Results¶
6-party, 5-issue sports facility construction negotiation scenario (averaged over 500 experiments):
| Method | FAR (Full Agreement Rate) | PAR (Partial Agreement Rate) | LAR (Latent Agreement Rate) |
|---|---|---|---|
| Base-LLM | 0.37 | 0.76 | 0.97 |
| Base-OM (all) | 0.56 | 0.92 | 0.99 |
| LLM-PE (all) | 0.32 | 0.69 | 0.93 |
| Proposed (all) | 0.62 | 0.89 | 0.98 |
Ablation Study¶
| Method | Preference Estimation MSE (Avg) | Notes |
|---|---|---|
| Proposed | 159 | Language + numerical fusion |
| Base-OM | 189 | Numerical bids only |
| LLM-PE | 163 | Direct LLM inference |
Key Findings¶
- Mutual modeling (all) substantially outperforms one-sided modeling (p1) (FAR 0.46→0.62), demonstrating multi-party synergy effects.
- Direct LLM-PE inference underperforms the purely numerical baseline (FAR 0.32 < 0.56), validating the necessity of a structured framework.
- Integrating linguistic signals reduces MSE from 189 to 159, yielding more accurate and balanced estimate distributions.
Highlights & Insights¶
- The hybrid paradigm of "LLM extraction + Bayesian inference" is highly instructive — it leverages the semantic capabilities of LLMs without relying on their reasoning consistency, using a mathematical framework to ensure structured updates.
- The elegant application of the Luce choice axiom naturally maps preference weights to selection probabilities, providing theoretical grounding for the conversion from linguistic signals to likelihood functions.
Limitations & Future Work¶
- The method assumes opponents communicate sincerely and does not account for deception or bluffing.
- Evaluation is conducted in a single scenario; generalizability to diverse settings remains to be tested.
- The hypothesis space grows factorially with the number of issues, necessitating approximation algorithms.
Related Work & Insights¶
- vs. Base-LLM: Pure LLM-based negotiation lacks structured preference tracking, leading to strategic inconsistency in prolonged negotiations.
- vs. LLM-PE: Direct LLM inference of numerical preferences is unreliable (FAR of only 0.32), underscoring the need for a probabilistic framework as a constraint.
Rating¶
- Novelty: ⭐⭐⭐⭐ The fusion of linguistic signals with a Bayesian framework is an original contribution.
- Experimental Thoroughness: ⭐⭐⭐ Limited to a single scenario with 500 experiments; scenario diversity is insufficient.
- Writing Quality: ⭐⭐⭐⭐ Formalization is clear and illustrations are intuitive.
- Value: ⭐⭐⭐⭐ Provides a valuable paradigm for applying LLMs in structured decision-making.