Skip to content

🖼️ Image Restoration

🔬 ICLR2026 · 15 paper notes

Activation Steering for Masked Diffusion Language Models

This work is the first to apply activation steering to Masked Diffusion Language Models (MDLMs), demonstrating that refusal behavior in MDLMs is likewise governed by a single low-dimensional direction. Globally projecting out this direction at every denoising step completely bypasses safety alignment. Unlike autoregressive models, effective directions can be extracted from pre-instruction tokens—reflecting the non-causal, parallel processing nature of diffusion models.

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Through statistical analysis of token confidence dynamics during the denoising process of diffusion language models (dLLMs), this work identifies a "Volatility Band" (VB) region that encodes local semantic structure in text. Building on this observation, it proposes AdaBlock-dLLM—a training-free, plug-and-play adaptive block size scheduler that aligns block boundaries in semi-autoregressive decoding with natural semantic steps, achieving up to 5.3% accuracy improvement at the same throughput.

Are Deep Speech Denoising Models Robust to Adversarial Noise?

This paper presents the first systematic evaluation of the robustness of four SOTA deep speech denoising (DNS) models against adversarial noise. By generating perceptually imperceptible adversarial perturbations via PGD attacks constrained by psychoacoustic masking, the authors demonstrate that Demucs, Full-SubNet+, FRCRN, and MP-SENet can be made to produce completely unintelligible gibberish. The evaluation covers diverse acoustic conditions and human listening studies, while also revealing the limitations of targeted attacks, universal perturbations, and cross-model transferability.

Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes

The LSP scheduler atomically commits the longest contiguous stable prefix at each denoising step—rather than accepting scattered discrete tokens—achieving up to 3.4× speedup in DLM inference while maintaining or slightly improving output quality.

Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training

This paper defines the novel problem of "Scale Anchoring" (SA)—wherein training on low-resolution data causes inference errors to remain anchored at training-resolution levels during high-resolution inference—and proposes an architecture-agnostic Frequency Representation Learning (FRL) method. By introducing Nyquist-normalized frequency encodings, FRL enables errors to decrease as resolution increases, with effectiveness validated across 8 mainstream architectures.

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

This paper proposes DiffusionBlocks, which interprets the layer-wise updates of residual networks as discretization steps of a continuous-time diffusion process, enabling the network to be partitioned into fully independently trainable blocks. This approach achieves competitive performance with end-to-end training while reducing training memory by a factor of \(B\) (the number of blocks).

Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss

This paper generalizes the EDLAE recommendation model's objective to a Decoupled Expected Quadratic Loss (DEQL), derives closed-form solutions over a broader hyperparameter range (\(b>0\)), and reduces computational complexity from \(O(n^4)\) to \(O(n^3)\) via the Miller matrix inversion lemma, surpassing EDLAE and deep learning models on multiple benchmark datasets.

Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models

This paper proposes Horizon Imagination (HI), which samples actions at an intermediate denoising step and processes multiple future frames in parallel, reducing the per-frame computation of on-policy imagination in diffusion world models to less than one full denoising pass while maintaining control performance.

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

InterActHuman is proposed to enable audio-driven video generation for multi-person/human interaction scenarios via an automatic spatiotemporal layout mask predictor and an iterative mask guidance strategy, supporting independent speech-driven lip synchronization and body motion for each character.

Mechanism of Task-oriented Information Removal in In-context Learning

This paper proposes a novel "information removal" perspective to explain the internal mechanism of In-context Learning (ICL): it finds that under zero-shot settings, language models encode queries into "non-selective representations" containing information about all possible tasks (leading to near-random outputs), while the core function of few-shot ICL is to simulate a "task-oriented information removal" process—through identified "Denoising Heads" that selectively remove redundant task information from entangled representations, guiding the model to focus on the target task. Ablation experiments confirm that blocking Denoising Heads significantly degrades ICL accuracy.

ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting

ProtoTS is proposed to achieve explainable time series forecasting via hierarchical prototype learning: a small number of coarse-grained prototypes provide a global pattern overview, while progressive refinement captures local variations. Heterogeneous exogenous variables are handled through multi-channel embedding and bottleneck fusion. On the LOF dataset, MSE is reduced by 48.3% and MAE by 20.9%. The framework additionally supports expert editing of prototypes to further improve performance.

Sharpness-Aware Machine Unlearning

This paper systematically analyzes the theoretical properties of SAM in the machine unlearning setting through a signal-noise decomposition framework. It finds that SAM abandons its denoising capability on the forget set while retaining it on the retain set. Motivated by this finding, the paper proposes the Sharp MinMax algorithm, which partitions the model into two components subject to sharpness minimization (retain) and sharpness maximization (forget) respectively, achieving state-of-the-art unlearning performance.

Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs

This work presents the first systematic comparison of layer-wise representation structure between diffusion large language models (dLLMs) and autoregressive (AR) LLMs. It finds that natively trained dLLMs exhibit stronger hierarchical abstraction and greater early-layer redundancy. Based on this finding, a static, task-agnostic inference-time layer skipping strategy is proposed, achieving 90%+ performance retention on LLaDA while skipping 6 layers (18.75% FLOPs reduction).

Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution

This paper proposes Ada-RefSR, a single-step reference-guided diffusion super-resolution framework based on the "Trust but Verify" principle. It introduces an Adaptive Implicit Correlation Gating (AICG) mechanism that maximally exploits reliable reference information while suppressing erroneous fusion, incurring only 0.13% additional computational overhead.

wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models

This paper proposes wd1, a ratio-free weighted log-likelihood policy optimization method for RL fine-tuning of diffusion language models (dLLMs). By combining positive-sample weighting with negative-sample penalization, wd1 avoids the bias and high variance introduced by policy ratio estimation in GRPO, achieving state-of-the-art performance of +59% on Sudoku and 84.5% on GSM8K over LLaDA-8B.