Skip to content

Preference Estimation via Opponent Modeling in Multi-Agent Negotiation

Conference: ACL 2026 arXiv: 2604.15687 Code: None Area: Video Understanding Keywords: Opponent Modeling, Bayesian Inference, Preference Estimation, Multi-Party Negotiation, LLM Language Signals

TL;DR

This paper proposes a preference estimation method that integrates LLM-extracted natural language preference signals into a Bayesian opponent modeling framework. In multi-party, multi-issue negotiations, it fuses qualitative cues and quantitative bid information via a linguistic likelihood function, improving the full agreement rate (FAR) from 37% to 62%.

Background & Motivation

State of the Field: Automated negotiation in multi-party, multi-issue settings relies heavily on accurate opponent modeling. Traditional approaches adopt the BOA architecture and estimate opponent utility functions from numerical bid histories via Bayesian learning.

Limitations of Prior Work: (1) Purely numerical methods fail to capture qualitative preference information expressed in natural language dialogue, leading to information incompleteness; (2) Although LLMs can understand semantics, using them directly for preference inference lacks strategic consistency and becomes unstable in prolonged negotiations; (3) Reasoning complexity for LLMs grows exponentially as the amount of information increases.

Root Cause: Rich qualitative information in language (e.g., "Issue A matters more to me") cannot be exploited by conventional numerical models, while LLMs lack a structured belief update mechanism.

Paper Goals: Design a preference estimation method that integrates linguistic signals into a structured Bayesian framework, combining semantic understanding with probabilistic inference.

Starting Point: Use an LLM to extract structured preference signals (target issue/option + stance) from utterances, then convert them into probabilistic likelihood functions via the Luce choice axiom, and fuse them with bid likelihoods for Bayesian updates.

Core Idea: Linguistic likelihood × bid likelihood → Bayesian posterior update, unifying qualitative and quantitative information within a probabilistic framework.

Method

Overall Architecture

At each negotiation round, an agent receives the opponent's bid \(d_t\) and utterance \(u_t\). An LLM parses the utterance to obtain a preference signal \(z_t\). The bid likelihood \(P(d_t|h_k)\) and linguistic likelihood \(P(z_t|h_k)\) are computed separately and fused via a naive Bayes assumption to update the hypothesis posterior \(P(h_k|d_t, z_t)\).

Key Designs

  1. Linguistic Preference Signal Extraction:

    • Function: Converts natural language utterances into structured preference signals.
    • Mechanism: An LLM parses utterance \(u_t\) into signal \(z_t\) with two attributes: Target (a single issue/option or a comparison between issues/options) and Stance (preference, opposition, etc.).
    • Design Motivation: Provides structured input for probabilistic computation, avoiding the unreliability of having LLMs directly output numerical estimates.
  2. Linguistic Likelihood via the Luce Choice Axiom:

    • Function: Converts structured signals into a probability distribution over the hypothesis space.
    • Mechanism: For a signal expressing "preference for issue \(i_x\)," the likelihood is \(P(z_t|h_k) = w_x^{(k)} / \sum_m w_m^{(k)}\), i.e., the proportion of that issue's weight relative to the total weight. Comparative and opposition signals are handled analogously.
    • Design Motivation: The Luce axiom is a classical probabilistic model in choice theory that naturally maps weights/evaluations to probabilities.
  3. Multimodal Bayesian Fusion:

    • Function: Jointly updates the posterior belief over opponent preferences.
    • Mechanism: Assuming conditional independence between bids and linguistic signals, the posterior is \(P(h_k|d_t, z_t) \propto P(d_t|h_k) \cdot P(z_t|h_k) \cdot P(h_k)\).
    • Design Motivation: Although the naive Bayes assumption is a simplification, it makes computation tractable, and bids and language do provide complementary information.

Loss & Training

No model training is involved. GPT-4.1 is used as the underlying LLM. Bayesian updates are performed online.

Key Experimental Results

Main Results

6-party, 5-issue sports facility construction negotiation scenario (averaged over 500 experiments):

Method FAR (Full Agreement Rate) PAR (Partial Agreement Rate) LAR (Latent Agreement Rate)
Base-LLM 0.37 0.76 0.97
Base-OM (all) 0.56 0.92 0.99
LLM-PE (all) 0.32 0.69 0.93
Proposed (all) 0.62 0.89 0.98

Ablation Study

Method Preference Estimation MSE (Avg) Notes
Proposed 159 Language + numerical fusion
Base-OM 189 Numerical bids only
LLM-PE 163 Direct LLM inference

Key Findings

  • Mutual modeling (all) substantially outperforms one-sided modeling (p1) (FAR 0.46→0.62), demonstrating multi-party synergy effects.
  • Direct LLM-PE inference underperforms the purely numerical baseline (FAR 0.32 < 0.56), validating the necessity of a structured framework.
  • Integrating linguistic signals reduces MSE from 189 to 159, yielding more accurate and balanced estimate distributions.

Highlights & Insights

  • The hybrid paradigm of "LLM extraction + Bayesian inference" is highly instructive — it leverages the semantic capabilities of LLMs without relying on their reasoning consistency, using a mathematical framework to ensure structured updates.
  • The elegant application of the Luce choice axiom naturally maps preference weights to selection probabilities, providing theoretical grounding for the conversion from linguistic signals to likelihood functions.

Limitations & Future Work

  • The method assumes opponents communicate sincerely and does not account for deception or bluffing.
  • Evaluation is conducted in a single scenario; generalizability to diverse settings remains to be tested.
  • The hypothesis space grows factorially with the number of issues, necessitating approximation algorithms.
  • vs. Base-LLM: Pure LLM-based negotiation lacks structured preference tracking, leading to strategic inconsistency in prolonged negotiations.
  • vs. LLM-PE: Direct LLM inference of numerical preferences is unreliable (FAR of only 0.32), underscoring the need for a probabilistic framework as a constraint.

Rating

  • Novelty: ⭐⭐⭐⭐ The fusion of linguistic signals with a Bayesian framework is an original contribution.
  • Experimental Thoroughness: ⭐⭐⭐ Limited to a single scenario with 500 experiments; scenario diversity is insufficient.
  • Writing Quality: ⭐⭐⭐⭐ Formalization is clear and illustrations are intuitive.
  • Value: ⭐⭐⭐⭐ Provides a valuable paradigm for applying LLMs in structured decision-making.