ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making¶

Conference: NeurIPS 2025 arXiv: 2512.13716 Code: Not released Area: Interpretability Keywords: value-driven decision-making, personalized AI, PROMETHEE, multi-criteria decision-making, human values

TL;DR¶

This paper proposes ValuePilot, a two-phase framework that constructs value-annotated decision scenarios via a Dataset Generation Toolkit (DGT) and performs multi-criteria decision-making through a Decision-Making Module (DMM) conditioned on personalized user value preferences, outperforming strong baselines including GPT-5 in alignment with human decisions.

Background & Motivation¶

Personalized decision-making is a central requirement in human–AI interaction: AI agents must adapt their behavior according to users' personal value preferences, not merely task objectives. Human decisions are driven by intrinsic values, as described by Schwartz's basic human values theory and Maslow's hierarchy of needs. However, existing AI decision-making paradigms exhibit clear shortcomings:

RLHF/DPO methods: rely on aggregated feedback, ignore inter-individual differences, and fail to capture fine-grained value dimensions.

Structured planning methods (e.g., ReAct, AutoPlan): focus on task efficiency without modeling intrinsic values.

Dataset deficiency: ALFWorld and InterCode focus only on task completion; WVS and Moral Stories lack explicit associations between decision scenarios and value dimensions.

Two core challenges arise: (1) identifying which value dimensions are relevant in a given scenario; and (2) making trade-offs across multiple value dimensions to select actions aligned with personal preferences.

Method¶

Overall Architecture¶

ValuePilot consists of two core components:

DGT (Dataset Generation Toolkit): automatically generates value-annotated decision scenarios via LLMs.
DMM (Decision-Making Module): learns to evaluate the alignment between actions and values, and produces action rankings conditioned on personal preferences.

Key Design 1: DGT Data Generation Pipeline¶

The Task Specifier in DGT adopts a three-stage pipeline:

Prompt construction and scenario generation: Given a target set of value dimensions, modular prompts are constructed to have GPT-4 generate multi-agent household scenarios that implicitly encode value dimensions without directly mentioning the corresponding keywords.
Action generation and value scoring: Ten candidate actions are generated per scenario; each action is annotated with continuous scores in \([-1, +1]\) on each value dimension (−1 indicates strong violation, 0 indicates neutrality, and +1 indicates strong alignment).
Automatic filtering and re-evaluation: An independent GPT-4 session re-infers the value dimensions of each scenario; scenarios whose inferred dimensions are inconsistent with the target set are discarded.

The data undergo human review by a four-person team with backgrounds in AI and psychology, ensuring scenario authenticity, action coherence, and value-alignment quality.

Key Design 2: Value Assessment Network¶

A T5 encoder processes scenario and action descriptions, with a multi-head self-attention mechanism capturing semantic relationships. The specific procedure is as follows:

Scenarios and actions are encoded into hidden states of shape \(H \times L \times b\).
After 4-head self-attention and average pooling, the representations are passed to a two-layer MLP (hidden dimension 128).
Value dimension scores in \([-1, 1]\) are output via a tanh activation.

Key Design 3: Action Selection Module¶

Personalized decision-making is accomplished in two steps:

Step 1: Contextualized Scoring

The raw user preference vector \(\mathbf{p}\) is first transformed via a sigmoid to handle the central-tendency bias in human ratings:

\[p'_j = \frac{1}{1 + e^{-(p_j - 0.5) \times 10}}\]

Preference-difference scores are then computed:

\[d^s_j = 1 - ||\rho^s_j| - p'_j|, \quad d^{a_i}_j = 1 - ||\rho^{a_i}_j| - p'_j|\]

Objective scores and difference scores are integrated as:

\[r^s_j = w \cdot d^s_j + (1-w) \cdot \rho^s_j, \quad r^{a_i}_j = w \cdot d^{a_i}_j + (1-w) \cdot \rho^{a_i}_j\]

where \(w=0.3\) balances subjective preference against objective value. The final score is scaled by scenario relevance:

\[r_{i,j} = \frac{1}{1 + e^{-|r^s_j|}} \times r^{a_i}_j\]

Step 2: PROMETHEE Multi-Criteria Decision-Making

Action selection is formulated as a multi-criteria decision problem. For actions \(i\) and \(i'\) on dimension \(j\), the preference degree is computed as:

\[V_{ii',j} = \frac{1}{1 + e^{-(r_{i,j} - r_{i',j})}}\]

Weighted aggregation using user preferences yields:

\[\tilde{V}_{ii'} = \sum_{j=1}^{m} p'_j \cdot V_{ii',j}\]

The final ranking score is the net outranking flow:

\[\phi_i = \phi^+_i - \phi^-_i = \frac{1}{N-1}\sum_{i' \neq i} \tilde{V}_{ii'} - \frac{1}{N-1}\sum_{i' \neq i} \tilde{V}_{i'i}\]

Loss & Training¶

Six core value dimensions are selected: curiosity, vitality, safety, happiness, intimacy, and fairness.
After DGT generation and human filtering, the dataset comprises 11,938 scenarios and 100,255 actions.
A six-level hierarchical structure (1D to 6D) is adopted, progressively increasing the complexity of value dimension combinations.
Automatic filtering removes 12%–25% of samples; human review further refines the data.

Key Experimental Results¶

Main Results 1: Value Recognition¶

Model	AvgAcc (t=0.2)	AvgAcc (t=0.05)	MAE
Llama-3.5-70b	40.90%	17.74%	0.30
Llama-3.5-405b	41.62%	18.00%	0.29
Mixtral-8x22b	42.71%	18.39%	0.29
Gemini-1.5-Flash	51.61%	25.64%	0.24
Value Assessment Network	66.70%	40.00%	0.19

At threshold \(t=0.2\), the proposed model surpasses the strongest baseline by 15.09 percentage points, reducing MAE by 36.7% relatively.

Main Results 2: Value-Driven Decision-Making (Human Alignment)¶

Model	OS-Sim	First-Acc
Llama-3.1-70b	~65%	~35%
GPT-4o-mini	~67%	~36%
Claude-Sonnet-4	~68%	~37%
GPT-5	69.23%	38.01%
DMM (Ours)	73.16%	46.14%

DMM outperforms GPT-5 by approximately 3.93% on OS-Sim and by 8.13% on First-Action Accuracy.

Ablation Study¶

Variant	OS-Sim	First-Acc
Only Action (no scenario, no preference)	60.23%	32.27%
w/o Preference (no personal preference)	61.07%	31.82%
w/o Subjective (no subjective adjustment)	68.93%	43.45%
w/o Scenario (no scenario scaling)	69.99%	43.64%
DMM (Full)	73.16%	46.14%

Key Findings¶

Personal preferences are critical: Removing preferences causes a drop of over 12 percentage points in OS-Sim, confirming the necessity of personalized modeling.
Scenario context is effective: Removing scenario scaling results in approximately 3% loss in OS-Sim, indicating that contextualized scoring contributes positively to decision quality.
Explicit value modeling outperforms implicit learning: DMM surpasses LLMs that rely on implicit pattern learning through explicit value dimension modeling, with a particularly clear advantage in First-Acc.
Advantage amplifies under stricter thresholds: At \(t=0.05\), the model exceeds baselines by 14.36 percentage points, demonstrating superior ability to capture subtle value distinctions.

Highlights & Insights¶

Values as stable, transferable signals: Compared to task-oriented paradigms, value-driven approaches exhibit greater generalizability and interpretability in novel scenarios.
Elegant integration of PROMETHEE: Combining a classical multi-criteria decision method with deep learning provides a theoretically grounded ranking mechanism for value trade-offs.
Human–machine collaborative data generation: The three-stage DGT pipeline—automatic generation, automatic filtering, and human review—balances data scale and quality.
Continuous bipolar scoring system: The \([-1, +1]\) scoring design expresses the nuanced relationship between actions and values more expressively than binary labels.
Sigmoid preference transformation: Elegantly addresses the central-tendency bias in human ratings.

Limitations & Future Work¶

Limited value dimensions: Only 6 dimensions are used, whereas real human value systems are far more complex (e.g., privacy, creativity, autonomy).
Domain restriction: Experiments are confined to household scenarios; performance in high-stakes domains such as workplace or healthcare settings remains unvalidated.
Dependence on GPT-4-generated data: Synthetic data may introduce distributional biases with potential impact on model robustness.
Small-scale human evaluation: Only 40 participants across 11 formal scenarios limits statistical power.
Static preference model: The framework does not account for the dynamic nature of user preferences as they evolve over time and context.
Scalability concerns: The pairwise comparison complexity of PROMETHEE is \(O(N^2 \times m)\), which may become a bottleneck when the number of candidate actions is large.

Distinction from RLHF/DPO: The latter aligns with collective preferences, whereas ValuePilot models individualized value dimensions.
Distinction from RPLA (role-playing agents): RPLA focuses on conversational consistency, while ValuePilot targets autonomous, value-driven action selection.
MCDM methodological perspective: Introducing PROMETHEE from traditional operations research into AI personalized decision-making establishes a novel methodological bridge.
Inspiration: Future work may explore extending value modeling to dynamic preference tracking across multi-turn dialogues, or integrating with RLHF for more fine-grained personalized alignment.

Rating¶

Novelty: ⭐⭐⭐⭐ — Novel perspective combining psychological value theory with MCDM methods for AI personalized decision-making.
Experimental Thoroughness: ⭐⭐⭐ — Evaluation across value recognition and human alignment dimensions is reasonable, but the human study is small in scale.
Writing Quality: ⭐⭐⭐⭐ — Framework description is clear, mathematical derivations are complete, and motivation is well articulated.
Value: ⭐⭐⭐⭐ — Offers a new path beyond RLHF for AI personalized alignment with strong methodological extensibility.