Cultural Learning-Based Culture Adaptation of Language Models¶

Conference: ACL 2025
arXiv: 2504.02953
Code: https://github.com/UKPLab/CLCA
Area: LLM/NLP
Keywords: cultural adaptation, cultural learning, role-playing, social interaction, value alignment

TL;DR¶

This paper proposes the CLCA framework, which draws on cultural learning theory to generate culturally adapted dialogue data through simulated social interactions. By combining this with intention understanding for multi-task training, the framework significantly improves the cultural value alignment of various LLMs on the World Values Survey.

Background & Motivation¶

Background: LLMs default to aligning with WEIRD values, resulting in poor cross-cultural adaptability. Existing methods rely heavily on prompt engineering.

Limitations of Prior Work: Prompts rely on cultural knowledge that the model has already learned; direct fine-tuning on generic corpora fails to achieve controlled cultural adaptation (Choenni et al. 2024).

Key Challenge: How can LLMs internalize the implicit values of a specific culture rather than merely performing surface-level imitation?

Goal: Generate implicit cultural signals through simulated social interactions for training.

Key Insight: Cultural learning theory (Tomasello) — humans acquire culture through social interaction and intention understanding.

Core Idea: Utilize LLM role-playing to generate culturally adapted social interaction dialogues, combined with intention understanding training, to achieve behavior-driven value adaptation.

Method¶

Overall Architecture¶

Culturally adapted social scenario generation -> LLM role-playing dialogue -> Quality filtering -> Intention labeling -> Multi-task training (dialogue + intention understanding) -> WVS evaluation.

Key Designs¶

Culturally Adapted Scenario Generation
- Based on the Sotopia framework, utilizing GPT-4 for cultural localization (names, locations, customs).
- Incorporating Hofstede cultural dimensions and the Inglehart-Welzel evolutionary cultural map.
- Design Motivation: Transmit implicit cultural norms through specific scenarios.
Dialogue Generation and Filtering
- Two LLMs perform role-playing, each having private social goals.
- Two-stage LLM-as-Judge filtering: generation quality and cultural consistency.
- Design Motivation: Ensure that the data reflects cultural values.
Intention Understanding Training
- Generate free-text intentions for each dialogue turn (generic vs. culturally relevant).
- Culturally relevant intentions serve as "instructed learning" signals.
- Design Motivation: Distinguish between essential behaviors and accidental behaviors.
Multi-Task Training (CLCA)
- Task 1: Multi-turn dialogue (imitative learning)
- Task 2: Intention understanding (instructed learning)
- Design Motivation: Correspond to the two basic forms of cultural learning.

Key Experimental Results¶

Main Results -- WVS Cultural Value Alignment¶

Model	Method	US	China	Japan	Saudi Arabia	Average Gain
Llama-3	Baseline	Baseline	Baseline	Baseline	Baseline	-
Llama-3	CLCA	+5%	+8%	+6%	+10%	+7.3%
Mistral	CLCA	+4%	+7%	+5%	+9%	+6.3%

Ablation Study¶

Configuration	WVS Alignment	Description
Dialogue only	+4%	Imitative learning is effective but insufficient
Intention only	+2%	Instructed learning alone has limited effectiveness
Full CLCA	+7%	The combination of both works best
Random dialogue	+1%	Cultural adaptation is key

Key Findings¶

CLCA is effective across models: Improvements are observed across multiple architectures.
Intention understanding is key: Performance drops significantly when it is removed.
Greater gains in non-Western cultures: The gains in Saudi Arabia and China are more pronounced.

Highlights & Insights¶

Transfer from cultural learning theory to AI—applying "imitation + instruction + intention understanding" to LLM cultural adaptation.
Behavior-driven vs. value-driven is an important distinction: modifying values by simulating behavior.
Culturally adapted social scenario generation can be transferred to other behavioral adaptation tasks.

Limitations & Future Work¶

Evaluation relies solely on the WVS questionnaire.
Synthetic data quality is influenced by GPT-4.
Collaborative learning (the third form of cultural learning) is not addressed.

vs. Prompt Engineering: Prompts are adjusted during inference, whereas CLCA internalizes values during training.
vs. Choenni et al.: They found generic corpora to be ineffective, whereas CLCA shows that social interaction data is effective.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Interdisciplinary innovation combining cultural learning theory and AI.
Experimental Thoroughness: ⭐⭐⭐⭐ Multi-model and multi-cultural validation with detailed ablations.
Writing Quality: ⭐⭐⭐⭐ Solid theoretical foundation.
Value: ⭐⭐⭐⭐ Highly significant for global LLM deployment.