🌐 Multilingual & Translation¶
🔬 ICLR2026 · 8 paper notes
📌 Same area in other venues: 💬 ACL2026 (64) · 🧪 ICML2026 (3) · 🤖 AAAI2026 (9) · 🧠 NeurIPS2025 (11) · 📹 ICCV2025 (1)
🔥 Top topics: Translation ×2
- ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
-
Ours proposes the ASSESS framework, centered on the TransTED Similarity metric. By parsing formal mathematical statements into Operator Trees (OPT) and integrating Lean proof tactic-driven semantic transformations into the standard Tree Edit Distance (TED), the method achieves SOTA performance with 70.16% accuracy and a 0.35 Kappa score on the EPLA benchmark, while remaining reproducible using only CPU resources.
- ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
-
This paper proposes the Adaptive Transfer Scaling Law (ATLAS), which decomposes the effective data volume into three terms: target language, transfer languages, and other languages, while introducing a data repetition saturation function. Validated across 774 multilingual training experiments (10M–8B parameters, 400+ languages), ATLAS significantly outperforms existing scaling laws (multilingual \(R^2\) improved from 0.67 to 0.98). It systematically quantifies the cross-lingual transfer matrix, the capacity constraints of the "curse of multilinguality," and the compute crossover point between pretraining and finetuning.
- DiscoX: Benchmarking Discourse-Level Translation in Expert Domains
-
DiscoX constructs the first benchmark for discourse-level + expert-level ZH-EN translation (200 articles, average 1712 tokens, 7 domains, 1330 person-hours of manual refinement) and proposes a multi-agent reference-free evaluation system Metric-S, revealing a significant gap where even the strongest LLM (GPT-5-high: 76.66) still lags behind human experts (80.16).
- From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
-
This paper proposes ALPO (Adaptive Local Preference Optimization) for training expressive subtitle translation LLMs. Empirical findings show that subtitle translation favors free translation and that reasoning-based LLMs outperform chat-based LLMs in paraphrasing capability. After verifying that LLMs as translation evaluators are highly consistent with humans, the authors propose a fine-grained process-supervised preference alignment method (adaptive weighting + dynamic beta + prefix mixing). The 14B model exceeds SOTA models like GPT-4o and DeepSeek-R1 in translation vividness across multiple language directions.
- Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
-
This paper proposes the Language Confusion Gate (LCG): a lightweight two-layer MLP that masks tokens from incorrect language families on-demand during decoding without modifying the base LLM. Trained via "norm-calibrated self-distillation," it reduces language confusion rates by approximately an order of magnitude across multiple models without sacrificing task performance.
- LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
-
By utilizing logit lens and hidden state similarity analysis, this work localizes the final few layers responsible for "language control" in mLLMs. Fine-tuning only these 3-5% of parameters increases language consistency across six languages from <20% to over 98%, achieving performance nearly equivalent to full fine-tuning.
- Multilingual Routing in Mixture-of-Experts
-
This paper systematically analyzes multilingual routing patterns in MoE large language models, discovering that middle layers contain cross-lingually shared experts and that linguistic performance is strongly correlated with alignment to English routing. Based on this, an inference-time routing intervention method is proposed to activate English task-specific experts in middle layers, consistently improving multilingual performance by 1-2% across 3 models, 2 tasks, and 15+ languages.
- SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
-
Utilizing Sparse Autoencoders (SAEs), it is discovered that unexpected code-switching in LLMs is correlated with abnormally high pre-activation values of target language features. This paper proposes SASFT, a method that constrains target language feature pre-activations during SFT training, reducing unexpected code-switching by more than 50%.