Skip to content

🌐 Multilingual & Translation

🔬 ICLR2026 · 7 paper notes

ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity

This paper proposes the ASSESS framework, whose core contribution is the TransTED Similarity metric. By parsing formal mathematical statements into Operator Trees (OPTs) and augmenting standard Tree Edit Distance (TED) with semantic transformations driven by Lean proof tactics, ASSESS achieves state-of-the-art performance of 70.16% accuracy and a Kappa score of 0.35 on the EPLA benchmark, while requiring only CPU resources for reproduction.

ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity

This paper proposes the ASSESS framework and the TransTED Similarity metric, which parses formal statements into operator trees and incorporates semantic transformations into tree edit distance computation, achieving state-of-the-art evaluation of autoformalization statement similarity (70.16% accuracy, 0.35 Kappa). The paper also releases the EPLA benchmark comprising 1,247 expert-annotated statement pairs.

ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality

This paper proposes the Adaptive Transfer Scaling Law (ATLAS), which decomposes effective data volume into three components—target language, transfer languages, and other languages—and introduces a data repetition saturation function. Evaluated across 774 multilingual training experiments (10M–8B parameters, 400+ languages), ATLAS substantially outperforms existing scaling laws, improving multilingual \(R^2\) from 0.67 to 0.98, and systematically quantifies the cross-lingual transfer matrix, capacity constraints underlying the curse of multilinguality, and the computational crossover point between pretraining and finetuning.

Multilingual Routing in Mixture-of-Experts

This paper systematically analyzes multilingual routing patterns in MoE large language models, finding that middle layers contain cross-lingually shared experts and that language performance is strongly correlated with alignment to English routing. Based on these findings, the authors propose an inference-time routing intervention that activates English task experts in middle layers, consistently improving multilingual performance by 1–2% across 3 models × 2 tasks × 15+ languages.

Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity

This paper proposes a text data filtering method based on token term frequency priors, detecting anomalous documents by computing the mean and standard deviation of token priors within each document. The approach achieves over 1000× speedup compared to PPL-based filtering while delivering superior downstream performance.

Prior-based Noisy Text Data Filtering: Fast and Strong Alternative for Perplexity

This paper proposes a text data filtering method based on token priors (token frequency statistics), using the mean and standard deviation of in-document token priors as a proxy for perplexity (PPL). The method achieves the highest average performance across 20 downstream benchmarks while being over 1000× faster than PPL-based filtering.

SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs

This paper uses Sparse Autoencoders (SAEs) to identify that unexpected code-switching in LLMs is associated with abnormally high pre-activation values of target-language features, and proposes SASFT, a method that constrains language feature pre-activation values during SFT training, reducing unexpected code-switching by over 50%.