Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models¶
Conference: ACL 2025
arXiv: 2502.02444
Code: None
Area: LLM Values / AI Safety
Keywords: value system, psycho-lexical approach, LLM alignment, safety prediction, generative psychometrics
TL;DR¶
Proposed the Generative Psycho-Lexical Approach (GPLA) to automatically construct a five-factor value system for LLMs (Social Responsibility, Risk-Taking, Rule-Following, Self-Competence, and Rationality), outperforming the classical Schwartz human value system in structural validity, safety prediction, and value alignment.
Background & Motivation¶
Background: Values are core beliefs that drive individual and collective behaviors. LLM value evaluation, comprehension, and alignment have become research hotspots in the field of AI safety.
Limitations of Prior Work: Existing research primarily relies on the Schwartz value system designed for humans, lacking a psychologically well-grounded value system tailor-made for LLMs. The 10 dimensions of the human value system exhibit poor CFA fit on LLMs.
Key Challenge: Human value systems may not adequately capture the unique value dimensions of LLMs, and traditional psycho-lexical approaches rely heavily on extensive human labeling and self-reporting, which suffer from high bias and lack scalability.
Goal: To construct a data-driven, psychologically grounded value system tailored for LLMs and provide standardized evaluation tasks.
Key Insight: Combine the traditional psycho-lexical approach with the generative capabilities of LLMs to achieve a fully automated value system construction pipeline, bypassing manual annotation.
Core Idea: Automatically extract perceptions, recognize values, filter redundancy, perform non-reactive profiling, and conduct statistical modeling using LLMs to construct an LLM-specific value system.
Method¶
Overall Architecture¶
GPLA adopts an agent-based framework consisting of three LLM agents (Perception Parser \(M_P\), Value Generator \(M_G\), and Value Evaluator \(M_E\)) and five steps: corpus-based perception extraction \(\to\) value recognition \(\to\) value filtering \(\to\) non-reactive profiling \(\to\) PCA modeling. Corpus sources include ValueBench, GPV, BeaverTails, and ValueLex, covering diverse value-rich LLM outputs. The entire pipeline is automated without human intervention.
Key Designs¶
- Perception Extraction and Value Recognition: Use \(M_P\) to extract value-rich expressions (perceptions) from corpora such as ValueBench and BeaverTails, then map them to underlying values using \(M_G\) (the Kaleido model) and record frequencies.
- Value Filtering: Use ROUGE scores and embedding similarity for deduplication, prioritizing high-frequency values to ensure a concise and representative vocabulary.
- Non-reactive Value Profiling: Adopt the GPV method to profile 693 LLM agents (33 LLMs \(\times\) 21 profiling prompts) to avoid self-reporting bias, followed by PCA to extract latent factors.
Loss & Training¶
- The alignment task uses PPO, with the objective of minimizing \(|\mathbf{x}_V^* - M_E(p, r, V)|\)
- Safety prediction uses a linear probe with the Bradley-Terry model to optimize pairwise cross-entropy
- Confirmatory Factor Analysis (CFA) uses standard psychometric validation processes
Key Experimental Results¶
Main Results (CFA Structural Validity)¶
| Value System | #Values | CFI↑ | GFI↑ | AIC↓ | BIC↓ |
|---|---|---|---|---|---|
| Schwartz (H) | 4 | 0.56 | 0.52 | 340 | 1484 |
| Schwartz (L) | 10 | 0.23 | 0.22 | 324 | 1464 |
| Ours | 5 | 0.68 | 0.65 | 265 | 1145 |
Ablation Study (Safety Prediction and Alignment)¶
| Value System | Safety Prediction Accuracy | Alignment-Harmlessness↓ | Alignment-Helpfulness↑ |
|---|---|---|---|
| Schwartz (H) | 81±15% | -1.52 | 2.15 |
| Schwartz (L) | 74±16% | -1.40 | 2.13 |
| Ours | 87±9% | -1.26 | 2.16 |
Key Findings¶
- The proposed value system significantly outperforms Schwartz in CFI (0.68 vs 0.56), GFI (0.65 vs 0.52), and BIC (1145 vs 1484).
-
The standard deviation in safety prediction is smaller (9% vs 15%), indicating that the proposed system is more stable and reliable.
-
The five-factor system consists of: Social Responsibility (\(\alpha=0.957\)), Risk-Taking (\(\alpha=0.919\)), Rule-Following (\(\alpha=0.842\)), Self-Competence (\(\alpha=0.761\)), and Rationality (\(\alpha=0.722\)), all exceeding the standard psychometric threshold of 0.7.
- Social Responsibility, Rule-Following, and Rationality promote safety, whereas Risk-Taking and Self-Competence undermine safety.
- LLM value consistency is highly correlated with safety scores (\(r=0.73\)).
- Consistency of value profiling across different datasets reaches 0.87.
Highlights & Insights¶
- First systematically proposed methodology for constructing a value system tailored specifically for LLMs, with a solid theoretical foundation (the lexical hypothesis).
- The fully automated pipeline of GPLA addresses the issues of manual labor costs and biases inherent in traditional methods.
- Three benchmark tasks (CFA, safety prediction, and value alignment) constitute a comprehensive evaluation framework.
- Significant improvements are achieved across all tasks compared to the Schwartz value system.
- The five-factor structure is clear and highly interpretable: Social Responsibility vs. Risk-Taking forms opposing axes (validated by circumplex analysis).
- Non-reactive profiling avoids self-reporting bias, yielding more reliable measurement results.
- Found that LLM value consistency is positively correlated with safety (\(r=0.73\)), providing a new perspective for safety evaluation.
- Large-scale profiling across 693 LLM agents (33 models \(\times\) 21 profiling prompts) ensures statistical reliability.
Limitations & Future Work¶
- The Cronbach's Alpha of certain factors (Rationality = 0.722) is close to the threshold, suggesting room for further optimizing the selection of atomic values.
- The corpus sources are limited (e.g., ValueBench, BeaverTails), and can be extended to more diverse LLM output scenarios.
- The dynamic evolution of the value system (such as updates across model iterations) has not been considered.
- Insufficient cross-cultural validation—the value system may lean toward Western values due to cultural biases in the training data.
- GPLA relies heavily on the quality of the three LLM agents; alternative model choices may affect the resulting value system.
- The manifestation of value conflicts (e.g., Social Responsibility vs. Self-Competence) in specific tasks has not been explored.
Related Work & Insights¶
- A modern adaptation of the traditional psycho-lexical approach (Allport & Odbert, 1936), replacing manual annotation pipelines with LLMs.
- GPV (Ye et al., 2025b) provides the foundation for non-reactive value profiling, acting as a core component of GPLA.
- Comparisons with the Schwartz value theory validate that LLMs indeed require a dedicated value system.
- Safety prediction results have practical implications for evaluating LLM deployment risks.
- ValueLex (Biedma et al., 2024) was a prior attempt but suffered from flaws in psychological grounding; this work provides a detailed comparison with it.
- The value alignment framework of BaseAlign (Yao et al., 2024a) was extended to arbitrary value systems.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ First systematic proposal for an LLM value system construction methodology, integrating psychology and AI.
- Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive validation across three benchmarks, though experiments with larger-scale models are lacking.
- Writing Quality: ⭐⭐⭐⭐ Clear structure, close integration of theory and experiments, and intuitive figures.
- Value: ⭐⭐⭐⭐⭐ Significant theoretical and practical contributions to the field of LLM safety and alignment.
- Overall Rating: Pioneering work that provides new infrastructure (methodology + value system + evaluation tasks) for LLM value research.
- Practicality: The value system can be directly applied to safety evaluations prior to LLM deployment and can predict safety when combined with a linear probe.
- Reproducibility: The methodology pipeline is clear, but depends on multiple specialized models (e.g., Kaleido, ValueLlama).
- Extensibility: GPLA can be applied to other psychological constructs (e.g., AI personality systems, attitude frameworks).
- Open Questions: Should the value system be dynamically updated with model iterations? How should emerging values (value outliers) be handled?
- Interdisciplinary Value: Bridges psychometric theory and AI safety practices, opening up new directions for interdisciplinary research.
- Key Numbers: 693 LLM agents, 33 models, 21 prompts, 5 value factors, 25 atomic values.
- Methodological Contribution: The five-step pipeline of GPLA can be reused to construct psychological construct systems in other domains.
- Practical Impact: Helps model developers evaluate and adjust the intrinsic value orientations of LLMs prior to training.