Skip to content

💬 LLM (Other)

🤖 AAAI2026 · 29 paper notes

📌 Same area in other venues: 📷 CVPR2026 (2) · 🔬 ICLR2026 (56) · 💬 ACL2026 (62) · 🧪 ICML2026 (39) · 🧠 NeurIPS2025 (53) · 📹 ICCV2025 (6)

🔥 Top topics: LLM ×14 · Reasoning ×2 · Alignment/RLHF ×2

A Content-Preserving Secure Linguistic Steganography

This paper proposes CLstega, the first content-preserving linguistic steganography paradigm, which embeds secret information into an unmodified cover text by fine-tuning a masked language model (MLM) to controllably transform its prediction distribution. The approach achieves a 100% extraction success rate and near-perfect security, with steganalysis detection accuracy approaching the random-guess baseline of 0.5.

An Invariant Latent Space Perspective on Language Model Inversion

This paper proposes the Invariant Latent Space Hypothesis (ILSH), which reframes the LLM inversion problem as reusing the LLM's own latent space. The Inv²A framework is designed to map outputs to denoised pseudo-representations via a lightweight inverse encoder, which are then decoded by a frozen LLM to recover hidden prompts. Inv²A achieves an average BLEU improvement of 4.77% across 9 datasets and attains comparable performance with only 20% of the training data.

Blue Teaming Function-Calling Agents

This paper systematically evaluates the robustness of four open-source function-calling LLMs against three attack types, and assesses the effectiveness of eight defense mechanisms, revealing that current models are insecure by default and that existing defenses remain difficult to deploy in practice.

CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models

This paper proposes CoEvo, a framework that integrates LLMs with evolutionary search methodology to achieve continual open-ended evolution of symbolic solutions through a dynamic knowledge library and multi-representation spaces (natural language / mathematical formulas / code), significantly outperforming existing symbolic regression methods on the AI Feynman benchmark.

Collaborative LLM Numerical Reasoning with Local Data Protection

This paper proposes a large-small model collaboration framework that protects sensitive local data through a two-stage anonymization pipeline — topic shifting followed by numerical substitution — applied to local queries. The remote GPT-4 returns reasoning solutions as executable Python code (plug-and-play tools), and the local model only needs to perform numerical back-substitution to obtain the final answer. The framework achieves 16–44% accuracy improvements on FinQA and MultiHiertt while reducing data leakage by 2–45%.

Control Illusion: The Failure of Instruction Hierarchies in Large Language Models

This paper systematically demonstrates that the system/user prompt separation mechanism in current LLMs fails to establish reliable instruction priority, and finds that social hierarchy priors acquired during pretraining (authority, expertise, consensus) exert stronger control over model behavior than explicit system/user role markers.

Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs

CNNs trained on LLM attention weights are used to evaluate the alignment between memorization taxonomies and actual attention mechanisms. A new three-class taxonomy (Guess/Recall/Non-Memorized) is proposed, improving the minimum F1 from 64.7% to 89.0%, while localizing that different memorization types rely on low-layer (Guess) and high-layer (Recall) attention, respectively.

ICL-Router: In-Context Learned Model Representations for LLM Routing

This paper proposes ICL-Router, a two-stage training framework (query reconstruction + ICL model routing) that encodes LLM capability profiles as in-context vectors, enabling scalable dynamic model routing. New models can be incorporated without retraining the router, achieving state-of-the-art performance on both in-distribution and out-of-distribution tasks.

Identifying and Analyzing Performance-Critical Tokens in Large Language Models

Through representation-level and token-level ablation experiments, this paper identifies the "performance-critical tokens" that LLMs directly rely on during ICL as template and stopword tokens (e.g., "Answer:"), rather than the content tokens that humans would attend to (e.g., actual text). It further reveals that LLMs indirectly exploit content by aggregating content information into the representations of these critical tokens.

IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization

This paper proposes IROTE, an in-context self-reflective optimization method grounded in information bottleneck theory. By iteratively generating and refining compact yet evocative textual "self-reflections," IROTE stably elicits target human traits (values, morality, personality) from LLMs across diverse downstream tasks without any fine-tuning, consistently outperforming existing baselines in trait consistency.

Learning Spatial Decay for Vision Transformers

This paper proposes the Spatial Decay Transformer (SDT), which for the first time adapts data-dependent spatial decay mechanisms from 1D sequence modeling to 2D vision Transformers. Through a Context-Aware Gating (CAG) module that generates dynamic, content-dependent decay intensities for patch interactions, SDT consistently outperforms strong baselines such as RMT on ImageNet-1K classification and generation tasks.

LILAD: Learning In-context Lyapunov-stable Adaptive Dynamics Models

This paper proposes LILAD, a framework that leverages the in-context learning (ICL) capability of GPT-2 to jointly learn a dynamics model and a Lyapunov function, achieving adaptive identification of non-stationary parametric dynamical systems while guaranteeing global exponential stability. LILAD outperforms baselines such as ICL and MAML on multiple benchmark systems.

LoKI: Low-damage Knowledge Implanting of Large Language Models

This paper proposes LoKI, a parameter-efficient fine-tuning method grounded in the mechanistic understanding of knowledge storage in Transformers. It introduces Knowledge Vector Attribution (KVA) to quantify the contribution of each knowledge vector in FFN layers, and applies a layer-balanced strategy to select low-contribution vectors for targeted knowledge implanting. The approach achieves strong task performance while substantially mitigating catastrophic forgetting.

LoopLLM: Transferable Energy-Latency Attacks in LLMs via Repetitive Generation

This paper proposes LoopLLM, a framework that launches energy-latency attacks by inducing LLMs into repetitive generation modes. Through repetition-inducing prompt optimization and token-aligned ensemble optimization, LoopLLM achieves over 90% of maximum output length across 12 open-source and 2 commercial LLMs, with approximately 40% improvement in cross-model transferability.

ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models Using Pareto High-Quality Data

This paper proposes ParetoHqD, which represents human preferences as preference directions in objective space (rather than linear scalarization), and performs two-stage SFT on high-quality data selected near the Pareto front. Using only 42% of the GPU time, it achieves multi-objective LLM alignment performance superior to five baselines.

PERSIST: Persistent Instability in LLM's Personality Measurements

The PERSIST framework systematically evaluates personality measurement stability across 29 LLMs (1B–685B) on over 2 million responses, revealing a "reasoning paradox" in which CoT reasoning increases variability while reducing perplexity, as well as a scale-dependent effect whereby conversational history exerts opposite influences on large versus small models—collectively indicating that current LLMs lack the architectural foundation for behavioral consistency.

Position on LLM-Assisted Peer Review: Addressing Reviewer Gap through Mentoring and Feedback

This position paper proposes shifting the role of LLMs in peer review from "automatically generating reviews" to "augmenting human reviewer capabilities" — via an LLM-driven mentoring system (three-phase training + certification) and a feedback system (violation detection + evidence-based feedback + reliability testing) to close the reviewer quality gap.

ProFuser: Progressive Fusion of Large Language Models

ProFuser is proposed to comprehensively identify the strengths of each source model across different dimensions via dual-mode advantage assessment (training-mode Min-CE + inference-mode Reward Model voting), and then integrates the complementary capabilities of heterogeneous LLMs into a single target model through a progressive fusion strategy (inference mode first → training mode second, as an easy-to-hard curriculum), achieving an average improvement of 1.65% across 6 benchmarks covering knowledge, reasoning, and safety.

Quantifying Conversational Reliability of Large Language Models under Multi-Turn Interaction

This paper systematically quantifies the reliability degradation of LLMs in multi-turn conversations through three deterministically evaluable representative tasks—instruction following, tool selection, and entity extraction—revealing failure modes such as instruction drift, intent confusion, and context overwriting in extended dialogues.

Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts

From a unified distortion rectification perspective, this paper proposes the UniRect framework, which employs Residual Progressive TPS for geometric deformation correction and Residual Mamba Blocks for degradation compensation. UniRect jointly handles four tasks—portrait correction, wide-angle rectangling, stitching rectangling, and rotation correction—via Sparse MoE for four-in-one multi-task learning. It achieves PSNR gains of 3.82 dB on stitching rectangling and 0.87 dB on rotation correction.

Scaling Equitable Reflection Assessment in Education via Large Language Models and Role-Based Feedback Agents

This paper proposes a zero-shot multi-agent pipeline comprising five role-based GPT-4o agents that assess learner reflection texts using a rubric-based scoring scheme and generate bias-aware conversational feedback. Evaluated on 336 reflections, the system achieves MAE=0.467, QWK=0.459 in scoring agreement, and a feedback quality score of Q(g)=3.967.

Soft Filtering: Guiding Zero-Shot Composed Image Retrieval with Prescriptive and Proscriptive Prompts

This paper proposes SoFT, a training-free plug-and-play reranking module that leverages a multimodal LLM to extract dual textual constraints — "must include" (prescriptive) and "must avoid" (proscriptive) — from a reference image and modification text, and applies soft-filtering reranking over candidate results in zero-shot composed image retrieval. A multi-target triplet dataset construction pipeline is also introduced to improve evaluation.

STEM: Efficient Relative Capability Evaluation of LLMs through Structured Transitive Evaluation Model

This paper proposes STEM, a framework that identifies "Significant Transition Samples" (STS) across models of the same architecture but varying scales to construct a lightweight evaluation subset, enabling efficient relative capability localization of unknown LLMs. STEM achieves 100% localization accuracy with only 100 samples, substantially outperforming random sampling and Bayesian methods.

TEMPLE: Incentivizing Temporal Understanding of Video LLMs via Progressive Pre-SFT Alignment

This paper proposes TEMPLE, which significantly enhances the temporal reasoning capabilities of Video LLMs through an automated video temporal preference data generation pipeline (video filtering → temporal perturbation → contrastive response generation) and a novel Progressive Pre-SFT Alignment strategy (curriculum learning + DPO prior to SFT), using a small amount of self-generated DPO data. Consistent improvements are achieved across multiple benchmarks including VideoMME, MLVU, and Vinoground.

TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model

This paper proposes TransMamba, a sequence-level Transformer-Mamba hybrid architecture that dynamically switches between Attention and SSM computation at different token positions via shared QKV/CBx parameters and a Memory Converter, achieving efficiency advantages for both short and long sequences.

Uncertainty Under the Curve: A Sequence-Level Entropy Area Metric for Reasoning LLMs

This paper proposes the Entropy Area Score (EAS)—a method that quantifies uncertainty in reasoning LLMs by integrating token-level predictive entropy via a single forward pass. EAS requires neither external models nor repeated sampling, achieves strong correlation with answer entropy (Pearson \(r=0.82\)), and when applied to training data selection outperforms Pass Rate filtering by 1.2–2.3% Pass@1, making it an efficient and interpretable uncertainty estimation tool for LLMs.

Vision Transformers are Circulant Attention Learners

This paper discovers that self-attention matrices in ViTs inherently learn Block Circulant with Circulant Blocks (BCCB) patterns, and proposes Circulant Attention, which achieves \(O(N\log N)\) complexity via 2D FFT, yielding consistent improvements on ImageNet classification, COCO detection, and ADE20K segmentation.

VSPO: Validating Semantic Pitfalls in Ontology via LLM-Based CQ Generation

This paper proposes the VSPO framework, which constructs a definition–axiom misalignment dataset and fine-tunes LLaMA-3.1-8B-Instruct to generate competency questions (CQs) capable of validating semantic pitfalls in ontologies (e.g., misuse of allValuesFrom). The approach surpasses GPT-4.1 by 26% in precision and 28.2% in recall.

Whispering Agents: An Event-Driven Covert Communication Protocol for the Internet of Agents

This paper presents the first formal definition of a "Covert Event Channel" in the Internet of Agents (IoA) and proposes the ΠCCAP protocol, which embeds secret data across the storage, timing, and behavioral dimensions of agent conversations, achieving high-capacity, high-robustness covert communication that is imperceptible to LLM-based censors.