💬 LLM (Other)¶

🧪 ICML2025 · 28 paper notes

📌 Same area in other venues: 📷 CVPR2026 (3) · 🔬 ICLR2026 (56) · 💬 ACL2026 (62) · 🧪 ICML2026 (39) · 🤖 AAAI2026 (29) · 🧠 NeurIPS2025 (54)

🔥 Top topics: LLM ×6 · Few-/Zero-Shot Learning ×2 · Time-Series Forecasting ×2

B-score: Detecting biases in large language models using response history: The paper proposes B-score, a metric that detects bias by comparing the difference in probability of LLM responses between single-turn and multi-turn dialogues. It discovers that LLMs can "self-debias" in multi-turn dialogues and leverages B-score to improve answer verification accuracy.
BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute: The paper proposes BEST-Route (Best-of-$n$ Enhanced Sampling and Test-time Route Optimization). It introduces the best-of-$n$ sampling strategy into traditional query routing, allowing the router to not only select the model but also adaptively decide the sampling number $n$. By replacing a single invocation of a large model with multiple samplings and selections from a small model, it reduces inference costs by up to 60% with less than 1% performance loss.
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence: This paper designs an In-Context Meta-Learning (ICML) experimental setup to reveal that the internal circuits of Transformers undergo three distinct phases of emergence (Bigram $\rightarrow$ Label Attention $\rightarrow$ Chunk Example) during the training process of acquiring in-context meta-learning capabilities, rather than the single-stage sudden jump observed in prior induction head studies. This provides a new perspective on understanding the deep mechanisms of ICL.
Binary Hypothesis Testing for Softmax Models and Leverage Score Models: This work investigates the binary hypothesis testing problem for Softmax and Leverage Score models from a theoretical perspective. It establishes tight bounds on the number of queries required to distinguish between two parameterized models under an energy constraint, which is relevant to understanding the discriminative capabilities of LLMs across different domains.
Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting: Proposes TimeFuse—a sample-level adaptive model fusion framework. It characterizes input time series features using meta-features and trains a learnable fuser to predict the optimal model combination weights, achieving near-universal improvements (outperforming the best single model on 95.1% of samples) across multiple forecasting benchmarks.
Build Agent Advocates, Not Platform Agents: A position paper arguing that language model agents (LMAs), if controlled by platform companies, will become "platform agents" that exacerbate surveillance, lock-in, and attention manipulation. The authors propose developing user-controlled "agent advocates" to protect individual autonomy, recommending three key interventions: open models/compute, interoperability standards, and market regulation.
Defending LVLMs Against Vision Attacks through Partial-Perception Supervision: Proposes DPS (Defense through Partial-Perception Supervision), which utilizes responses from cropped images as "weak supervision" to guide the full-image model for self-correction during inference. This achieves training-free, black-box visual attack defense for LVLMs, reducing the average attack success rate by 76.3%.
Expert Evaluation of LLM World Models: A High-Tc Superconductivity Case Study: Using the field of high-temperature superconductivity (HTS) as a case study, an expert-level dataset (1,726 papers + 67 expert questions) is constructed to systematically evaluate the scientific literature understanding capabilities of six LLM systems. The evaluation reveals that RAG systems based on curated literature significantly outperform general closed-source models in terms of factual completeness and evidentiary support.
Generalized Interpolating Discrete Diffusion: The Generalized Interpolating Discrete Diffusion (GIDD) framework is proposed, which generalizes Masked Diffusion Models (MDM) to a family of diffusion processes supporting arbitrary time-varying mixture distributions. By combining mask and uniform noise, GIDD equips the model with self-correction capabilities and achieves compute-matched SOTA in diffusion language modeling.
Generative Social Choice: The Next Generation: Extends the generative social choice framework to scenarios with cost/budget constraints and approximate queries, proposes the DemocraticProcess algorithm with near-optimal approximate proportional representation guarantees, and implements a practical system PROSE (based on GPT-4o) validated on drug review and urban governance datasets.
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence: Proposes a dual-part token embedding strategy (a shared learnable part + a random distinguishing part), enabling language models to generalize to larger vocabularies post-training and possess inherent robustness to alpha-equivalent transformations.
LaRoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation: LaRoSA proposes a training-free activation sparsification method. By applying layerwise orthogonal rotation matrices, it transforms input activations into a space better suited for sparsification, and combines this with Top-K selection to achieve consistent model-level sparsity and reliable inference acceleration.
LASER: Attention with Exponential Transformation: By analyzing the gradient backpropagation bottleneck of softmax in the attention mechanism, this paper proposes LASER attention—performing attention computation in the exponentially transformed Value space (i.e., applying attention to $\exp(V)$ and then taking the logarithm), thereby obtaining larger Jacobian signals and improving parameter learning efficiency.
LLM Social Simulations Are a Promising Research Method: As a position paper, this work synthesizes 36 empirical studies to argue that LLM social simulation (using LLMs to simulate human research subjects) is a promising research methodology. It identifies five addressable challenges (diversity, bias, sycophancy, alienness, generalization) and proposes promising directions for each.
MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training: Proposes the MERIT optimizer, which extends LAMB with maximum-norm normalization and element-wise trust ratios to effectively resolve the performance degradation caused by attention logit explosion during large-batch training.
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding: This work establishes the first approximation rate theory of Looped Transformers regarding the number of loops and the modulus of continuity of target functions. It reveals the loop-architecture-specific approximation error sources (contextual continuity and token continuity) and proposes the Timestep-Modulated Looped Transformer (TMLT) to eliminate this limitation via timestep encoding, achieving consistent improvements across reasoning, in-context learning, and language modeling tasks.
Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective: By employing the LLM simultaneously as a candidate generator and a scorer, this work leverages a DFS-based search algorithm to generate high-probability candidate solutions and subsequently utilizes Product of Experts (PoE) scoring under multi-perspective augmentation to select the optimal answer. This approach achieves an open-source SOTA accuracy of 71.6% on the ARC-AGI public evaluation set, surpassing the average human performance (60.2%) with a single-task inference cost of only around $0.02.
QuEst: Enhancing Estimates of Quantile-Based Distributional Measures Using Model Predictions: The QuEst framework is proposed to combine a small amount of high-quality observed data with a large amount of model-predicted (imputed) data. This provides more accurate point estimates and rigorous confidence intervals for quantile-based distributional measures (QBDMs), covering classical metrics such as CVaR and Interval-VaR.
Random Registers for Cross-Domain Few-Shot Learning: This work discover that in cross-domain few-shot learning (CDFSL), learnable prompts impair generalization in the target domain, whereas replacing them with random noise (i.e., random registers) consistently improves performance. Based on this observation, the REAP method is proposed, which enhances attention perturbation by introducing random registers to semantic image regions, enabling efficient domain-agnostic feature learning.
Regress, Don't Guess — A Regression-like Loss on Number Tokens for Language Models: Proposes Number Token Loss (NTL), a pure token-level regression-like loss function that injects a numerical proximity inductive bias into LLMs by minimizing the $L_p$ norm or Wasserstein distance between target and predicted numerical tokens.
RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning: Constructs the first large-scale "rulebreaker" dataset, RULEBREAKERS (25,600 instances), to systematically evaluate the performance of 7 LLMs when formal logical reasoning conflicts with factual knowledge. The study reveals that models generally tend to apply logical rules over-rigidly while ignoring common sense, deviating significantly from human reasoning behavior.
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets: Safe Delta proposes a safety-aware post-training defense method that consistently preserves LLM safety across diverse fine-tuning datasets (of varying scales and task types) without sacrificing utility. This is achieved by estimating safety degradation, selectively retaining delta parameters to maximize utility while limiting safety loss, and applying a safety compensation vector to mitigate residual safety loss.
Star Attention: Efficient LLM Inference over Long Sequences: Proposes Star Attention, a two-phase block-sparse attention mechanism: Phase 1 encodes partitioned context blocks via local attention across multiple hosts, and Phase 2 generates queries by aggregating global attention. It is compatible with existing LLMs without fine-tuning, achieving up to an 11x speedup in inference while retaining 97-100% accuracy.
TabFlex: Scaling Tabular Learning to Millions with Linear Attention: Replaces the softmax attention in TabPFN with linear attention to scale the in-context learning (ICL) method for tabular classification from small datasets to millions of samples, achieving a over 2× speedup with no performance degradation.
The Lock-in Hypothesis: Stagnation by Algorithm: This paper proposes and formalizes "The Lock-in Hypothesis": the human-AI feedback loop formed during LLM training and deployment solidifies users' pre-existing beliefs, leading to an irreversible loss of collective viewpoint diversity and potentially locking the population into incorrect beliefs.
Theoretical Limitations of Ensembles in the Age of Overparameterization: Under overparameterization conditions, infinite ensembles are pointwise equivalent to a single infinitely wide model. The ensemble variance no longer reflects traditional Bayesian uncertainty but instead measures the expected effect of increasing model capacity, providing a theoretical explanation for empirical observations that deep ensembles offer no fundamental generalization advantage over large models.
Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings: This work proposes the UniSO framework, which encodes optimization variables of different types and dimensions into unified JSON strings before feeding them into language models. It trains universal regressors using two modeling paradigms: token prediction (UniSO-T) and numerical regression (UniSO-N). The embedding space quality is improved via metadata-guided contrastive learning and Lipschitz smoothness regularization, achieving universal offline black-box optimization across domains and dimensions.
When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series: This work proposes the Anomaly to Prompt (A2P) framework. Via two core modules, Anomaly-Aware Forecasting (AAF) and Synthetic Anomaly Prompting (SAP), it effectively solves the new task of "Anomaly Prediction" (AP) in time series for the first time—not only predicting future signal trends but also accurately locating which exact timestamps in the future will experience anomalies.