Skip to content

🌐 Multilingual & Translation

🔬 ICLR2026 · 8 paper notes

📌 Same area in other venues: 💬 ACL2026 (64) · 🧪 ICML2026 (3) · 🤖 AAAI2026 (9) · 🧠 NeurIPS2025 (11) · 📹 ICCV2025 (1)

🔥 Top topics: Translation ×2

ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity

Ours proposes the ASSESS framework, centered on the TransTED Similarity metric. By parsing formal mathematical statements into Operator Trees (OPT) and integrating Lean proof tactic-driven semantic transformations into the standard Tree Edit Distance (TED), the method achieves SOTA performance with 70.16% accuracy and a 0.35 Kappa score on the EPLA benchmark, while remaining reproducible using only CPU resources.

ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality

This paper proposes the Adaptive Transfer Scaling Law (ATLAS), which decomposes the effective data volume into three terms: target language, transfer languages, and other languages, while introducing a data repetition saturation function. Validated across 774 multilingual training experiments (10M–8B parameters, 400+ languages), ATLAS significantly outperforms existing scaling laws (multilingual \(R^2\) improved from 0.67 to 0.98). It systematically quantifies the cross-lingual transfer matrix, the capacity constraints of the "curse of multilinguality," and the compute crossover point between pretraining and finetuning.

DiscoX: Benchmarking Discourse-Level Translation in Expert Domains

DiscoX constructs the first benchmark for discourse-level + expert-level ZH-EN translation (200 articles, average 1712 tokens, 7 domains, 1330 person-hours of manual refinement) and proposes a multi-agent reference-free evaluation system Metric-S, revealing a significant gap where even the strongest LLM (GPT-5-high: 76.66) still lags behind human experts (80.16).

From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization

This paper proposes ALPO (Adaptive Local Preference Optimization) for training expressive subtitle translation LLMs. Empirical findings show that subtitle translation favors free translation and that reasoning-based LLMs outperform chat-based LLMs in paraphrasing capability. After verifying that LLMs as translation evaluators are highly consistent with humans, the authors propose a fine-grained process-supervised preference alignment method (adaptive weighting + dynamic beta + prefix mixing). The 14B model exceeds SOTA models like GPT-4o and DeepSeek-R1 in translation vividness across multiple language directions.

Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

This paper proposes the Language Confusion Gate (LCG): a lightweight two-layer MLP that masks tokens from incorrect language families on-demand during decoding without modifying the base LLM. Trained via "norm-calibrated self-distillation," it reduces language confusion rates by approximately an order of magnitude across multiple models without sacrificing task performance.

LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

By utilizing logit lens and hidden state similarity analysis, this work localizes the final few layers responsible for "language control" in mLLMs. Fine-tuning only these 3-5% of parameters increases language consistency across six languages from <20% to over 98%, achieving performance nearly equivalent to full fine-tuning.

Multilingual Routing in Mixture-of-Experts

This paper systematically analyzes multilingual routing patterns in MoE large language models, discovering that middle layers contain cross-lingually shared experts and that linguistic performance is strongly correlated with alignment to English routing. Based on this, an inference-time routing intervention method is proposed to activate English task-specific experts in middle layers, consistently improving multilingual performance by 1-2% across 3 models, 2 tasks, and 15+ languages.

SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs

Utilizing Sparse Autoencoders (SAEs), it is discovered that unexpected code-switching in LLMs is correlated with abnormally high pre-activation values of target language features. This paper proposes SASFT, a method that constrains target language feature pre-activations during SFT training, reducing unexpected code-switching by more than 50%.