Preference Estimation via Opponent Modeling in Multi-Agent Negotiation¶

Conference: ACL 2026 arXiv: 2604.15687 Code: None Area: Video Understanding Keywords: Opponent Modeling, Bayesian Inference, Preference Estimation, Multi-Party Negotiation, LLM Language Signals

TL;DR¶

This paper proposes a preference estimation method that integrates LLM-extracted natural language preference signals into a Bayesian opponent modeling framework. In multi-party, multi-issue negotiations, it fuses qualitative cues and quantitative bid information via a linguistic likelihood function, improving the full agreement rate (FAR) from 37% to 62%.

Background & Motivation¶

State of the Field: Automated negotiation in multi-party, multi-issue settings relies heavily on accurate opponent modeling. Traditional approaches adopt the BOA architecture and estimate opponent utility functions from numerical bid histories via Bayesian learning.

Limitations of Prior Work: (1) Purely numerical methods fail to capture qualitative preference information expressed in natural language dialogue, leading to information incompleteness; (2) Although LLMs can understand semantics, using them directly for preference inference lacks strategic consistency and becomes unstable in prolonged negotiations; (3) Reasoning complexity for LLMs grows exponentially as the amount of information increases.

Root Cause: Rich qualitative information in language (e.g., "Issue A matters more to me") cannot be exploited by conventional numerical models, while LLMs lack a structured belief update mechanism.

Paper Goals: Design a preference estimation method that integrates linguistic signals into a structured Bayesian framework, combining semantic understanding with probabilistic inference.

Starting Point: Use an LLM to extract structured preference signals (target issue/option + stance) from utterances, then convert them into probabilistic likelihood functions via the Luce choice axiom, and fuse them with bid likelihoods for Bayesian updates.

Core Idea: Linguistic likelihood × bid likelihood → Bayesian posterior update, unifying qualitative and quantitative information within a probabilistic framework.

Method¶

Overall Architecture¶

At each negotiation round, an agent receives the opponent's bid \(d_t\) and utterance \(u_t\). An LLM parses the utterance to obtain a preference signal \(z_t\). The bid likelihood \(P(d_t|h_k)\) and linguistic likelihood \(P(z_t|h_k)\) are computed separately and fused via a naive Bayes assumption to update the hypothesis posterior \(P(h_k|d_t, z_t)\).

Key Designs¶

Linguistic Preference Signal Extraction:
- Function: Converts natural language utterances into structured preference signals.
- Mechanism: An LLM parses utterance \(u_t\) into signal \(z_t\) with two attributes: Target (a single issue/option or a comparison between issues/options) and Stance (preference, opposition, etc.).
- Design Motivation: Provides structured input for probabilistic computation, avoiding the unreliability of having LLMs directly output numerical estimates.
Linguistic Likelihood via the Luce Choice Axiom:
- Function: Converts structured signals into a probability distribution over the hypothesis space.
- Mechanism: For a signal expressing "preference for issue \(i_x\)," the likelihood is \(P(z_t|h_k) = w_x^{(k)} / \sum_m w_m^{(k)}\), i.e., the proportion of that issue's weight relative to the total weight. Comparative and opposition signals are handled analogously.
- Design Motivation: The Luce axiom is a classical probabilistic model in choice theory that naturally maps weights/evaluations to probabilities.
Multimodal Bayesian Fusion:
- Function: Jointly updates the posterior belief over opponent preferences.
- Mechanism: Assuming conditional independence between bids and linguistic signals, the posterior is \(P(h_k|d_t, z_t) \propto P(d_t|h_k) \cdot P(z_t|h_k) \cdot P(h_k)\).
- Design Motivation: Although the naive Bayes assumption is a simplification, it makes computation tractable, and bids and language do provide complementary information.

Loss & Training¶

No model training is involved. GPT-4.1 is used as the underlying LLM. Bayesian updates are performed online.

Key Experimental Results¶

Main Results¶

6-party, 5-issue sports facility construction negotiation scenario (averaged over 500 experiments):

Method	FAR (Full Agreement Rate)	PAR (Partial Agreement Rate)	LAR (Latent Agreement Rate)
Base-LLM	0.37	0.76	0.97
Base-OM (all)	0.56	0.92	0.99
LLM-PE (all)	0.32	0.69	0.93
Proposed (all)	0.62	0.89	0.98

Ablation Study¶

Method	Preference Estimation MSE (Avg)	Notes
Proposed	159	Language + numerical fusion
Base-OM	189	Numerical bids only
LLM-PE	163	Direct LLM inference

Key Findings¶

Mutual modeling (all) substantially outperforms one-sided modeling (p1) (FAR 0.46→0.62), demonstrating multi-party synergy effects.
Direct LLM-PE inference underperforms the purely numerical baseline (FAR 0.32 < 0.56), validating the necessity of a structured framework.
Integrating linguistic signals reduces MSE from 189 to 159, yielding more accurate and balanced estimate distributions.

Highlights & Insights¶

The hybrid paradigm of "LLM extraction + Bayesian inference" is highly instructive — it leverages the semantic capabilities of LLMs without relying on their reasoning consistency, using a mathematical framework to ensure structured updates.
The elegant application of the Luce choice axiom naturally maps preference weights to selection probabilities, providing theoretical grounding for the conversion from linguistic signals to likelihood functions.

Limitations & Future Work¶

The method assumes opponents communicate sincerely and does not account for deception or bluffing.
Evaluation is conducted in a single scenario; generalizability to diverse settings remains to be tested.
The hypothesis space grows factorially with the number of issues, necessitating approximation algorithms.

vs. Base-LLM: Pure LLM-based negotiation lacks structured preference tracking, leading to strategic inconsistency in prolonged negotiations.
vs. LLM-PE: Direct LLM inference of numerical preferences is unreliable (FAR of only 0.32), underscoring the need for a probabilistic framework as a constraint.

Rating¶

Novelty: ⭐⭐⭐⭐ The fusion of linguistic signals with a Bayesian framework is an original contribution.
Experimental Thoroughness: ⭐⭐⭐ Limited to a single scenario with 500 experiments; scenario diversity is insufficient.
Writing Quality: ⭐⭐⭐⭐ Formalization is clear and illustrations are intuitive.
Value: ⭐⭐⭐⭐ Provides a valuable paradigm for applying LLMs in structured decision-making.