Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models¶

Conference: ACL 2025
arXiv: 2502.02444
Code: None
Area: LLM Values / AI Safety
Keywords: value system, psycho-lexical approach, LLM alignment, safety prediction, generative psychometrics

TL;DR¶

Proposed the Generative Psycho-Lexical Approach (GPLA) to automatically construct a five-factor value system for LLMs (Social Responsibility, Risk-Taking, Rule-Following, Self-Competence, and Rationality), outperforming the classical Schwartz human value system in structural validity, safety prediction, and value alignment.

Background & Motivation¶

Background: Values are core beliefs that drive individual and collective behaviors. LLM value evaluation, comprehension, and alignment have become research hotspots in the field of AI safety.

Limitations of Prior Work: Existing research primarily relies on the Schwartz value system designed for humans, lacking a psychologically well-grounded value system tailor-made for LLMs. The 10 dimensions of the human value system exhibit poor CFA fit on LLMs.

Key Challenge: Human value systems may not adequately capture the unique value dimensions of LLMs, and traditional psycho-lexical approaches rely heavily on extensive human labeling and self-reporting, which suffer from high bias and lack scalability.

Goal: To construct a data-driven, psychologically grounded value system tailored for LLMs and provide standardized evaluation tasks.

Key Insight: Combine the traditional psycho-lexical approach with the generative capabilities of LLMs to achieve a fully automated value system construction pipeline, bypassing manual annotation.

Core Idea: Automatically extract perceptions, recognize values, filter redundancy, perform non-reactive profiling, and conduct statistical modeling using LLMs to construct an LLM-specific value system.

Method¶

Overall Architecture¶

GPLA adopts an agent-based framework consisting of three LLM agents (Perception Parser \(M_P\), Value Generator \(M_G\), and Value Evaluator \(M_E\)) and five steps: corpus-based perception extraction \(\to\) value recognition \(\to\) value filtering \(\to\) non-reactive profiling \(\to\) PCA modeling. Corpus sources include ValueBench, GPV, BeaverTails, and ValueLex, covering diverse value-rich LLM outputs. The entire pipeline is automated without human intervention.

Key Designs¶

Perception Extraction and Value Recognition: Use \(M_P\) to extract value-rich expressions (perceptions) from corpora such as ValueBench and BeaverTails, then map them to underlying values using \(M_G\) (the Kaleido model) and record frequencies.
Value Filtering: Use ROUGE scores and embedding similarity for deduplication, prioritizing high-frequency values to ensure a concise and representative vocabulary.
Non-reactive Value Profiling: Adopt the GPV method to profile 693 LLM agents (33 LLMs \(\times\) 21 profiling prompts) to avoid self-reporting bias, followed by PCA to extract latent factors.

Loss & Training¶

The alignment task uses PPO, with the objective of minimizing \(|\mathbf{x}_V^* - M_E(p, r, V)|\)
Safety prediction uses a linear probe with the Bradley-Terry model to optimize pairwise cross-entropy
Confirmatory Factor Analysis (CFA) uses standard psychometric validation processes

Key Experimental Results¶

Main Results (CFA Structural Validity)¶

Value System	#Values	CFI↑	GFI↑	AIC↓	BIC↓
Schwartz (H)	4	0.56	0.52	340	1484
Schwartz (L)	10	0.23	0.22	324	1464
Ours	5	0.68	0.65	265	1145

Ablation Study (Safety Prediction and Alignment)¶

Value System	Safety Prediction Accuracy	Alignment-Harmlessness↓	Alignment-Helpfulness↑
Schwartz (H)	81±15%	-1.52	2.15
Schwartz (L)	74±16%	-1.40	2.13
Ours	87±9%	-1.26	2.16

Key Findings¶

The proposed value system significantly outperforms Schwartz in CFI (0.68 vs 0.56), GFI (0.65 vs 0.52), and BIC (1145 vs 1484).
The standard deviation in safety prediction is smaller (9% vs 15%), indicating that the proposed system is more stable and reliable.
The five-factor system consists of: Social Responsibility (\(\alpha=0.957\)), Risk-Taking (\(\alpha=0.919\)), Rule-Following (\(\alpha=0.842\)), Self-Competence (\(\alpha=0.761\)), and Rationality (\(\alpha=0.722\)), all exceeding the standard psychometric threshold of 0.7.
Social Responsibility, Rule-Following, and Rationality promote safety, whereas Risk-Taking and Self-Competence undermine safety.
LLM value consistency is highly correlated with safety scores (\(r=0.73\)).
Consistency of value profiling across different datasets reaches 0.87.

Highlights & Insights¶

First systematically proposed methodology for constructing a value system tailored specifically for LLMs, with a solid theoretical foundation (the lexical hypothesis).
The fully automated pipeline of GPLA addresses the issues of manual labor costs and biases inherent in traditional methods.
Three benchmark tasks (CFA, safety prediction, and value alignment) constitute a comprehensive evaluation framework.
Significant improvements are achieved across all tasks compared to the Schwartz value system.
The five-factor structure is clear and highly interpretable: Social Responsibility vs. Risk-Taking forms opposing axes (validated by circumplex analysis).
Non-reactive profiling avoids self-reporting bias, yielding more reliable measurement results.
Found that LLM value consistency is positively correlated with safety (\(r=0.73\)), providing a new perspective for safety evaluation.
Large-scale profiling across 693 LLM agents (33 models \(\times\) 21 profiling prompts) ensures statistical reliability.

Limitations & Future Work¶

The Cronbach's Alpha of certain factors (Rationality = 0.722) is close to the threshold, suggesting room for further optimizing the selection of atomic values.
The corpus sources are limited (e.g., ValueBench, BeaverTails), and can be extended to more diverse LLM output scenarios.
The dynamic evolution of the value system (such as updates across model iterations) has not been considered.
Insufficient cross-cultural validation—the value system may lean toward Western values due to cultural biases in the training data.
GPLA relies heavily on the quality of the three LLM agents; alternative model choices may affect the resulting value system.
The manifestation of value conflicts (e.g., Social Responsibility vs. Self-Competence) in specific tasks has not been explored.

A modern adaptation of the traditional psycho-lexical approach (Allport & Odbert, 1936), replacing manual annotation pipelines with LLMs.
GPV (Ye et al., 2025b) provides the foundation for non-reactive value profiling, acting as a core component of GPLA.
Comparisons with the Schwartz value theory validate that LLMs indeed require a dedicated value system.
Safety prediction results have practical implications for evaluating LLM deployment risks.
ValueLex (Biedma et al., 2024) was a prior attempt but suffered from flaws in psychological grounding; this work provides a detailed comparison with it.
The value alignment framework of BaseAlign (Yao et al., 2024a) was extended to arbitrary value systems.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First systematic proposal for an LLM value system construction methodology, integrating psychology and AI.
Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive validation across three benchmarks, though experiments with larger-scale models are lacking.
Writing Quality: ⭐⭐⭐⭐ Clear structure, close integration of theory and experiments, and intuitive figures.
Value: ⭐⭐⭐⭐⭐ Significant theoretical and practical contributions to the field of LLM safety and alignment.
Overall Rating: Pioneering work that provides new infrastructure (methodology + value system + evaluation tasks) for LLM value research.
Practicality: The value system can be directly applied to safety evaluations prior to LLM deployment and can predict safety when combined with a linear probe.
Reproducibility: The methodology pipeline is clear, but depends on multiple specialized models (e.g., Kaleido, ValueLlama).
Extensibility: GPLA can be applied to other psychological constructs (e.g., AI personality systems, attitude frameworks).
Open Questions: Should the value system be dynamically updated with model iterations? How should emerging values (value outliers) be handled?
Interdisciplinary Value: Bridges psychometric theory and AI safety practices, opening up new directions for interdisciplinary research.
Key Numbers: 693 LLM agents, 33 models, 21 prompts, 5 value factors, 25 atomic values.
Methodological Contribution: The five-step pipeline of GPLA can be reused to construct psychological construct systems in other domains.
Practical Impact: Helps model developers evaluate and adjust the intrinsic value orientations of LLMs prior to training.