From Neurons to Semantics: Evaluating Cross-Linguistic Alignment Capabilities of Large Language Models via Neurons Alignment¶
Conference: ACL 2025
arXiv: 2507.14900
Code: None
Area: LLM/NLP
Keywords: Cross-lingual alignment, neuron states, multilingual LLMs, FFN analysis, semantic retrieval
TL;DR¶
A cross-lingual alignment evaluation framework, NeuronXA, is proposed based on neuron activation states. By utilizing FFN layer neuron states as translation-invariant internal representations to measure the cross-lingual alignment capability of multilingual LLMs, it achieves a Pearson correlation of 0.9556 with downstream task performance using only 100 parallel sentence pairs.
Background & Motivation¶
Large language models (LLMs) demonstrate strong multilingual capabilities, but evaluating cross-lingual alignment remains under-investigated. Existing alignment evaluation methods mainly rely on similarity in sentence embedding spaces (such as cosine similarity). However, they suffer from a fundamental issue: neural network models (e.g., BERT, GPT) tend to produce anisotropic representation spaces, leading to representation collapse and diminishing the semantic expression capability of low-resource languages, which limits the reliability of embedding-based cross-lingual alignment evaluation.
The key inspiration of this study comes from a neuroscience finding: similar information activates overlapping neural regions. The authors hypothesize that neuron activations in FFN layers can serve as an intrinsic representation of multilingual inputs, providing a more structured and robust means of capturing cross-lingual knowledge. Prior research indicates that neurons in FFN modules encode various forms of knowledge (factual knowledge, positional information, syntactic triggers, etc.), which provides a theoretical foundation for using neuron states to estimate cross-lingual alignment.
Method¶
Overall Architecture¶
The NeuronXA framework consists of three core steps: 1. Neuron State Detection: Extraction of neuron activation information from FFN layers. 2. Sentence Representation Construction: Securing sentence-level neuron states via position-weighted averaging. 3. Alignment Score Calculation: Computing weak alignment ratios based on the cosine similarity matrix.
Key Designs¶
-
Neuron States Detection: Two detection methods are proposed:
- NAS (Neuron Activation State): Binarized activation states, where values > 0 are active (\(1\)) and \(\le 0\) are inactive (\(0\)). This reflects the immediate response of neurons to inputs.
- NAV (Neuron Activation Value): Employs the absolute value of neuron activation to reflect the contribution scale of neurons to the FFN layer output, serving as a more refined functional metric.
-
Sentence Representation: Addressing the causal self-attention mechanism in decoder-only LLMs, a position-weighted averaging strategy is utilized instead of simple averaging: \(N_l = \sum_{t=1}^{T} w_t n_{lt}\), where \(w_t = \frac{t}{\sum_{k=1}^{T}k}\). Tokens in later positions are assigned higher weights to mitigate the over-representation of early tokens under causal attention.
-
NeuronXA Alignment Score: Generates a cosine similarity matrix \(C(l)\) and calculates the proportion of parallel sentences satisfying weak alignment: \(\mu_{C(l)} = \frac{1}{n}\sum_{i=1}^{n}\mathbf{1}(c_{ii} > \{c_{ij}, c_{ji}\}_{j \neq i})\). This checks whether each pair of parallel sentences are mutual nearest neighbors. Average pooling across layers is conducted to obtain the final alignment score.
-
Two Alignment Evaluation Methods:
- NASCA: Calculates alignment scores based on binarized neuron activation states.
- NAVCA: Calculates alignment scores based on the absolute values of neuron activation.
Loss & Training¶
This paper introduces an evaluation methodology and does not involve training. Evaluation utilizes off-the-shelf pre-trained LLMs (LLaMA, Qwen, Mistral, GLM, OLMo series); the alignment evaluation can be completed using only 100 parallel sentence pairs.
Key Experimental Results¶
Main Results¶
Parallel Sentence Retrieval¶
| Representation Method | Direction | FLORES-200 (Head) | FLORES-200 (Long-tail) | Tatoeba (Head) | Tatoeba (Long-tail) |
|---|---|---|---|---|---|
| Embedding | En⇔xx | 83.78 | 40.95 | 16.86 | 10.12 |
| NAS | En⇔xx | 87.07 | 42.20 | 57.78 | 32.47 |
NAS consistently outperforms traditional sentence embedding on bidirectional retrieval, particularly achieving massive gains on Tatoeba (\(16.86 \rightarrow 57.78\)).
Alignment-Downstream Task Correlation¶
| Method | XNLI Correlation | BMLAMA-53 Correlation | Multilingual Benchmark Avg Correlation |
|---|---|---|---|
| MEXA | 0.8370 | 0.7463 | 0.8291 |
| NASCA | 0.9326 | 0.7701 | 0.8312 |
| NAVCA | 0.8937 | 0.8065 | 0.8191 |
The Pearson correlation of NASCA with XNLI zero-shot transfer performance reaches 0.9326, and with BMLAMA-53 reaches 0.7701.
Ablation Study¶
| Configuration | Key Metric | Description |
|---|---|---|
| NAS vs Embedding Retrieval | +40.92% (Tatoeba En⇔xx) | NAS representation far outperforms embedding representation |
| Directional Symmetry | NAS is almost symmetric | Embedding directional difference reaches 30.73% |
| 100 pairs vs More sentences | Stably high correlation | 100 parallel sentence pairs are sufficient |
Key Findings¶
-
Mitigation of Directional Asymmetry: Traditional embeddings exhibit differences as high as 30.73% between En \(\rightarrow\) xx and xx \(\rightarrow\) En directions on Tatoeba. In contrast, the NAS representation almost entirely mitigates this asymmetry, indicating that it captures cross-lingual semantics more effectively.
-
Layer-wise Alignment Dynamics: Alignment scores peak in middle layers and are lowest in the preliminary and final layers. Lower layers primarily map different languages to a shared semantic space centered around high-resource languages, while higher layers project the semantic content back onto language-specific vocabularies.
-
High-to-Low Resource Gap: High-resource language pairs (e.g., Italian \(\rightarrow\) French, NASCA = 0.8372) perform far better than low-resource language pairs (e.g., Gujarati \(\rightarrow\) Banjar, NASCA = 0.2191). However, the improvement brought by the NAS representation is more pronounced for low-resource languages.
-
Consistency Across Models: NeuronXA evaluation remains consistent and effective across multiple models including LLaMA, Qwen, Mistral, GLM, and OLMo.
Highlights & Insights¶
- Interdisciplinary Inspiration: Intelligently borrows the neuroscience finding that "similar stimuli activate overlapping neural circuits", presenting a novel perspective for evaluating cross-lingual alignment in NLP.
- Exceptional Efficiency: High-quality evaluation is achieved with only 100 parallel sentence pairs, significantly reducing evaluative costs.
- Elimination of Directional Bias: The NAS representation achieves near-symmetrical retrieval performance, solving a prominent flaw of embedding-based methods.
- Beneficial for Low-Resource Languages: The NAS representation space is smoother, mitigating the negative impact of representation collapse on low-resource languages.
- Insights from Layer-wise Analysis: Unveils the inner multilingual processing mechanism of LLMs, demonstrating that middle layers are critical for semantic alignment.
Limitations & Future Work¶
- Currently, the study focuses exclusively on neurons in FFN layers, leaving the contributions of attention layers unexplored.
- English is used solely as the pivot language; the effectiveness of utilizing other high-resource languages as pivots remains uninvestigated.
- There is a lack of deep theoretical explanations as to why the NAS representation space is smoother.
- The evaluation only covers models within the 7B-14B scale; the effectiveness on larger or smaller models remains to be verified.
- The position-weighted averaging strategy is heuristic and may not be the optimal sentence representation method.
Related Work & Insights¶
- Cross-lingual Alignment: Embedding similarity-based methods like MEXA suffer from anisotropy issues.
- Neuron Analysis: Prior works such as Dai et al. 2022 demonstrate that FFN neurons encode diverse types of knowledge.
- Multilingual LLM Inner Mechanisms: Wendler et al. 2024 discover the existence of latent languages.
- The neuron-state representation paradigm proposed in this paper can be extended to other scenarios requiring intrinsic representation, such as model interpretability and knowledge probing.
Rating¶
- Novelty: ⭐⭐⭐⭐ Assessing cross-lingual alignment through neuron activation states is a completely new perspective, though the methodology itself is relatively straightforward.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers multiple models, datasets, and languages. The correlation analysis is comprehensive, including both retrieval and transfer tasks.
- Writing Quality: ⭐⭐⭐⭐ Clear structure, natural motivation, and rich visualizations.
- Value: ⭐⭐⭐⭐ Provides an efficient and effective evaluation tool for cross-lingual alignment, offering practical guidance for multilingual LLM research.