🖼️ Image Restoration¶
🔬 ICLR2026 · 61 paper notes
📌 Same area in other venues: 📷 CVPR2026 (107) · 🧪 ICML2026 (21) · 🤖 AAAI2026 (10) · 🧠 NeurIPS2025 (26) · 📹 ICCV2025 (31)
🔥 Top topics: Diffusion Models ×20 · Image Restoration ×17 · Super-Resolution ×15 · Compression ×4 · Adversarial Robustness ×3
- A Statistical Benchmark for Diffusion-Posterior-Sampling Algorithms
-
This paper establishes a "standard ruler" for Diffusion Posterior Sampling (DPS) algorithms: by utilizing Lévy process signals—which allow for exact Gibbs sampling—as the test distribution, it obtains "gold standard" posterior samples at the distribution level. The authors systematically evaluate mainstream DPS algorithms (C-DPS / DiffPIR / DPnP) across four types of inverse problems (denoising, deconvolution, inpainting, and partial Fourier reconstruction) using MMSE optimality gap and posterior coverage metrics. The conclusion reveals that these algorithms are generally not calibrated.
- Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
-
The Adam adaptive moment estimation from standard optimizers is directly applied to the guidance gradients of diffusion sampling. By maintaining the exponential moving average (EMA) of the first and second moments of likelihood score estimates across sampling steps, the noisy gradients of plug-and-play methods like DPS and CG are stabilized at almost zero extra cost. This approach outperforms several more complex and slower methods in image restoration (super-resolution, deblurring, inpainting) and class-conditional generation.
- Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
-
The authors track the training process of Image Restoration (IR) Transformers and discover that standard LayerNorm causes feature magnitudes to diverge to the million-level scale and channel entropy to collapse sharply. The root cause is identified as LN's "per-token normalization" and "input-independent scaling," which conflict with IR tasks. Consequently, they propose i-LN—a plug-and-play replacement for LN that performs normalization across the entire spatial-channel dimension and adaptively adds the scaling factor back after each Attention/FFN block. This stabilizes training and consistently improves performance in SR, denoising, deraining, and JPEG artifact removal.
- Are Deep Speech Denoising Models Robust to Adversarial Noise?
-
This paper presents the first systematic evaluation of the robustness of 4 SOTA Deep Speech Denoising (DNS) models against adversarial noise. By generating imperceptible adversarial perturbations through psychoacoustic-constrained PGD attacks, the authors demonstrate that Demucs, Full-SubNet+, FRCRN, and MP-SENet can be forced to output completely unintelligible gibberish. The experiments cover various acoustic conditions and human evaluations, while revealing limitations of targeted attacks, universal perturbations, and cross-model transfer.
- Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
-
The LSP scheduler accelerates DLM inference by 3.4\(\times\) by atomically committing the longest stable continuous prefix in each denoising step (rather than scattered discrete tokens), while maintaining or slightly improving output quality.
- Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
-
The paper defines "Scale Anchoring" (where low-resolution training anchors error during high-resolution inference) and proposes an architecture-agnostic Frequency Representation Learning (FRL). By using Nyquist-normalized frequency encoding, it ensures that errors decrease as resolution increases, which is validated across 8 mainstream architectures.
- CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving via Diffusion Posterior Sampling
-
CL-DPS utilizes an offline-trained contrastive learning encoder to approximate the intractable likelihood term \(p(y\mid x_t)\) in diffusion posterior sampling (DPS). This enables diffusion models to solve blind nonlinear inverse problems (e.g., rotation blur, radial blur) for the first time without knowing or estimating operator parameters. It achieves clean restorations where existing methods fail, while remaining competitive on linear blind deblurring tasks.
- Content-Aware Mamba for Learned Image Compression
-
Addressing the two major flaws of Mamba in learned image compression—"fixed raster scanning" and "strict causality"—this paper proposes Content-Aware Mamba (CAM). It uses token rearrangement based on codebook clustering to group similar tokens for scanning and injects global priors into SSM output projections via a redundancy-aware prompt dictionary to break causality. Consequently, the CMIC model outperforms VTM-21.0 across Kodak/Tecnick/CLIC with BD-rates of −15.91%/−21.34%/−17.58%, while maintaining nearly 80% lower GPU memory usage than similar Mamba-based methods.
- Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
-
This paper proposes V3, which utilizes a unified 3D Video Fourier Field (VFF) to represent video directly as a sum of sinusoids in \((x,y,t)\) space. By discarding the fragmented and fragile "Spatial INR + Optical Flow Warp" paradigm, it transforms super-resolution at arbitrary spatial and temporal scales into a single continuous sampling process. Furthermore, it enables the closed-form incorporation of a Gaussian Point Spread Function (PSF) for anti-aliasing, achieving a PSNR improvement of approximately 1.5–2 dB across multiple benchmarks while being faster and more memory-efficient.
- DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences
-
DeAltHDR is the first to directly address the neglected reality that "alternating exposure LDR frames inherently contain noise and motion blur." By employing a Flow-Guided Masked Attention (FGMA) module, it performs cross-frame alignment only in occlusion areas where optical flow is unreliable, while utilizing cheap optical flow warping elsewhere. This achieves a tunable trade-off between efficiency and quality. Coupled with a self-supervised adaptation method improved for large video motions, it surpasses existing SOTA on both synthetic and real-world datasets.
- DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
-
DeLiVR integrates two types of geometric priors from the SO(2) Lie group—"per-frame rotation" and "inter-frame angular velocity differentiation"—directly into the Transformer attention scores as biases. It achieves geometrically consistent cross-frame alignment and temporal deraining without relying on optical flow, reaching SOTA performance on the real-world WeatherBench with only 2.64M parameters.
- Denoising Neural Reranker for Recommender Systems
-
This paper points out that retrieval scores in industrial two-stage "retrieval → reranking" pipelines are useful but noisy signals that are often ignored. It reformulates reranking as a denoising task for retrieval scores, utilizing an adversarial noise generator. By jointly training with denoising, adversarial, and distribution regularization objectives, it consistently outperforms existing SOTA reranking methods on three public datasets and an industrial system.
- DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
-
DiffusionBlocks is proposed to interpret the layer-wise updates of residual networks as discretization steps of a continuous-time diffusion process. This allows partitioning the network into blocks that can be trained completely independently, reducing training memory by a factor of \(B\) (the number of blocks) while maintaining performance competitive with end-to-end training.
- DISK: Differentiable Sparse Kernel Complex for Efficient Spatially-Variant Convolution
-
A large and complex dense kernel is re-represented as a "cascade of sparse kernels." End-to-end differentiable optimization (instead of heuristic search) is used to learn the offsets and weights of sampling points in each layer. Combined with shape-aware initialization and filter space interpolation, this achieves up to approximately 20× acceleration for spatially-variant filtering on mobile devices with image quality close to ground-truth.
- Divergence-Free Neural Networks with Application to Image Denoising
-
This paper proposes CENSURE, a neural network parameterization that is "divergence-free by design." By utilizing a representer theorem to structure divergence-free vector fields as a combination of "anti-symmetric matrices × gradients of conservative fields" and adopting a sparse approximation for high-dimensional images, the method achieves higher stability and accuracy than constrained methods like Noise2Self and UNSURE in self-supervised denoising scenarios where the noise level \(\sigma\) is unknown and varies per sample.
- Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
-
MIRAGE achieves higher accuracy and lower computational overhead in all-in-one image restoration by "splitting attention features by channel into three branches (CNN/Attention/MLP) for specialized tasks + aligning shallow and deep features via contrastive learning in the SPD covariance space."
- Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
-
The E-Bridge framework is proposed, which achieves optimal performance for multi-task image restoration under single-step inference by constructing low-energy manifold geodesic trajectories and a closed-form one-step consistency solver.
- Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
-
Addressing the compressed streaming video super-resolution scenario ignored by existing datasets, this paper constructs StreamSR, a dataset of 5200 compressed video segments collected from YouTube. It systematically evaluates 11 real-time SR models and proposes EfRLFN, a lightweight model based on RLFN featuring tanh activation, ECA attention, and composite loss. EfRLFN achieves a new quality-complexity SOTA while maintaining real-time frame rates (271 FPS).
- FAST-DIPS: Adjoint-Free Analytic Steps and Hard-Constrained Likelihood Correction for Diffusion-Prior Inverse Problems
-
FAST-DIPS replaces expensive inner MCMC or multi-step gradient loops in training-free diffusion inverse problem solvers with a set of "adjoint-free hard-constrained likelihood corrections." For each noise level, it performs a few-step ADMM correction near the denoiser prediction using closed-form projections and analytically optimal step sizes. This minimizes the per-layer computational budget while achieving comparable or better quality across eight linear/nonlinear restoration tasks, with speedups up to 19.5×.
- FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
-
This work reformulates motion deblurring as a diffusion-like process where the "blur level serves as the timestep." By employing consistency training, all timesteps are aligned to predict the same sharp image, achieving single-step, high-fidelity deblurring with a pre-trained diffusion model, complemented by a Kernel ControlNet for blur kernel prior injection and adaptive timestep prediction.
- Flower: A Flow-Matching Solver for Inverse Problems
-
Flower transforms a pre-trained flow-matching generative model into a linear inverse problem solver. At each time step, it predicts the clean destination, applies a proximal projection for data consistency using the observation operator, and then advances along the flow trajectory. It achieves superior results compared to existing flow-based solvers on image restoration tasks such as denoising, deblurring, super-resolution, and inpainting.
- FreeAdapt: Unleashing Diffusion Priors for Ultra-High-Definition Image Restoration
-
This paper proposes a training-free "Frequency-Feature Collaborative Guidance" (FFSG) mechanism. It utilizes the phase spectrum of a low-resolution reference image and global attention to constrain local generation during each denoising step of patch-based inference. Combined with an optional VAE decoder fine-tuning module, it achieves plug-and-play adaptation of pretrained LDMs for Ultra-High-Definition (4K/8K) image restoration, providing an average PSNR Gain of over 2 dB without modifying the U-Net.
- Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss
-
The objective function of the EDLAE recommendation model is generalized into the Decoupled Expected Quadratic Loss (DEQL). A closed-form solution is derived for a broader range of the hyperparameter \(b>0\), and computational complexity is reduced from \(O(n^4)\) to \(O(n^3)\) using Miller's matrix inversion theorem, outperforming both EDLAE and deep learning models on multiple benchmark datasets.
- Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models
-
The authors propose Horizon Imagination (HI): enabling diffusion world models to parallel denoise multi-frame future observations in a single forward pass. Combined with stable action sampling to suppress unnecessary action flips on noisy frames and a Horizon schedule that decouples the denoising tempo from the total budget, HI maintains on-policy imagination performance even with a sub-frame budget (less than one denoising step per frame) and halved computational costs.
- Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
-
The 11B 3D DiT video super-resolution teacher DOVE is compressed into a 0.57B "2D+1D" student network, AdcVSR. By utilizing dual-head dual-discriminator adversarial distillation, the conflicting objectives of "rich details" and "temporal consistency" are decoupled and optimized, achieving a 95% parameter reduction and an 8x speedup with almost no loss in image quality.
- InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
-
InterActHuman is proposed to achieve audio-driven video generation in multi-person and human-interaction scenes through an automated spatio-temporal layout-inferring mask predictor and an iterative mask-guiding strategy, supporting independent lip-sync and body movements for each character.
- KernelFusion: Zero-Shot Blind Super-Resolution via Patch Diffusion
-
KernelFusion trains a patch-based diffusion model on a single LR image. Based on the principle that the correct kernel is one that maximizes cross-scale patch similarity, it recovers arbitrary (including non-Gaussian) downsampling kernels and corresponding HR images during the reverse diffusion process, pushing blind super-resolution into a zero-shot paradigm entirely free of training distribution assumptions.
- Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
-
This work proposes DATPRL-IR, the first multi-domain all-in-one image restoration method. It learns domain-aware task prompt representations via a dual prompt pool (Task Prompt Pool + Domain Prompt Pool), distills domain priors from MLLMs, and guides restoration through adaptive gating fusion, significantly surpassing SOTA across 9 tasks in natural, medical, and remote sensing domains.
- Learning Heterogeneous Degradation Representation for Real-World Super-Resolution
-
This paper proposes SAVL (Spatially Amortized Variational Learning), which models the degradation of each pixel as a "spatially-varying Gaussian distribution" inferred from local neighborhoods. A mutual information suppression term is employed to decouple degradation from image content, resulting in an implicit representation that is both spatially heterogeneous and highly discriminative of degradation factors. The SR network is subsequently guided by a dual-path posterior of "mean (channel modulation) + variance (spatial modulation)" for reconstruction.
- LearnIR: Learnable Posterior Sampling for Real-World Image Restoration
-
LearnIR utilizes a lightweight network to directly learn the "gradient correction term distribution" in diffusion posterior sampling, bypassing the limitation of traditional DPS that requires a known forward degradation operator \(A\). Combined with a VAE-free Dynamic Resolution Module (DRM), it achieves end-to-end, high-fidelity image restoration for real-world degradation tasks like dehazing and deshadowing.
- LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
-
LinearSR successfully applies \(O(N)\) linear attention to photo-realistic diffusion super-resolution for the first time. By integrating "Early Stopping at the Knee-point Fine-tuning (ESGF), SNR-based Mixture-of-Experts (MoE), and Tag-based Guidance (TAG)", it simultaneously addresses training collapse, perception-distortion trade-offs, and guidance signal selection. The framework achieves SOTA 1-NFE efficiency (0.036s core diffusion forward pass for 1024×1024) while maintaining SOTA perceptual quality.
- LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
-
Addressing the real-world pain point of significant quality degradation when "reselecting a key frame" in Live Photos, LiveMoments utilizes an SD3-based dual-branch diffusion network. It treats the original high-quality key frame as a same-sequence reference and employs a two-layer motion alignment strategy—"latent-space motion-guided attention + image-level patch correspondence retrieval"—to restore blurry and misaligned reselected frames to a quality comparable to the original.
- LucidFlux: Caption-Free Photo-Realistic Image Restoration via a Large-Scale Diffusion Transformer
-
LucidFlux utilizes a frozen 12B Flux.1 large-scale Diffusion Transformer for real-world image restoration. By employing a dual-branch conditioner, timestep-layer adaptive modulation, SigLIP-based caption-free semantic alignment, and large-scale high-quality data filtering, it achieves superior perceptual quality and semantic consistency across multiple real-world and synthetic degradation benchmarks.
- Mechanism of Task-oriented Information Removal in In-context Learning
-
This work explains the internal mechanism of In-context Learning (ICL) from a new perspective of "Information Removal." It finds that Language Models (LMs) encode queries into "non-selective representations" containing information of all possible tasks during zero-shot (leading to random outputs). The core role of few-shot ICL is to simulate a "task-oriented information removal" process—identifying "Denoising Heads" that selectively remove redundant task information from entangled representations to guide the model toward the target task. Ablation studies confirm that blocking these Denoising Heads significantly decreases ICL accuracy.
- Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
-
This paper transforms the solution of diffusion model inverse problems from "adding data consistency gradients to intermediate image states" to "performing HMC posterior sampling in the initial DDIM noise space." By marginalizing unknown measurement noise to derive NA-NHMC, the method achieves robust reconstruction quality across super-resolution, inpainting, deblurring, phase retrieval, and HDR without task-specific parameter tuning.
- One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
-
OFTSR distills a noise-augmented conditional rectified flow teacher into a one-step student model, requiring the student's predictions at various time points \(t\) to fall on the same PF-ODE trajectory of the teacher. This allows the model to continuously slide between fidelity and realism by adjusting a single parameter \(t\) in a single forward pass, achieving SOTA one-step SR performance on FFHQ, DIV2K, ImageNet, and real-world datasets.
- Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
-
EditedID is a training-free, plug-and-play diffusion inversion framework that restores facial identity lost during editing by multimodal large models through a three-step "Alignment-Disentanglement-Entanglement" process without any fine-tuning. It preserves edited accessories/clothing (Element IP) while achieving SOTA ID consistency in both single-person and multi-person open scenarios.
- Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
-
This paper proposes ContinuousSR, which reconstructs a low-resolution image into a continuous 2D Gaussian field at once via the "pixel-to-Gaussian" paradigm. Any subsequent magnification is achieved through a single fast rendering (approx. 1ms), surpassing SOTA in quality across seven benchmarks (+0.18 dB on Manga109) while achieving a 19.5× speedup when continuously scaling across 40 levels.
- PlantRSR: A New Plant Dataset and Method for Reference-based Super-Resolution
-
This paper constructs PlantRSR, the first reference-based super-resolution (RefSR) dataset for plant scenes (containing 16,585 pairs of manually aligned HR–Ref training patches). It proposes a method specifically designed for irregular plant textures: Selective Key Region Matching (SKRM) performs matching only in texture-rich areas to significantly reduce computational costs, and the Texture-Guided Diffusion Module (TGDM) progressively refines LR features conditioned on matched reference textures. The method achieves state-of-the-art performance across PlantRSR and multiple public benchmarks with only 11.1M parameters.
- ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
-
This paper proposes ProtoTS, which achieves explainable time series forecasting through hierarchical prototype learning: a few coarse-grained prototypes provide a global pattern overview, while successive levels of sub-prototypes capture local variations. It combines multi-channel embedding with bottleneck fusion to handle heterogeneous exogenous variables. On the LOF dataset, it reduces MSE by 48.3% and MAE by 20.9%, while supporting expert editing of prototypes to further enhance performance.
- Reconstruct Anything Model: A Lightweight General Model for Computational Imaging
-
This paper proposes the Reconstruct Anything Model (RAM), which utilizes a lightweight 36M-parameter non-iterative DRUNet-based reconstruction network to directly inject imaging operators, measurements, and noise parameters into feature layers. It achieves strong zero-shot reconstruction across tasks like deblurring, MRI, CT, super-resolution, inpainting, and low-photon imaging, while supporting self-supervised fine-tuning using only few measurements without ground truth.
- Recover Cell Tensor: Diffusion-Equivalent Tensor Completion for Fluorescence Microscopy Imaging
-
This paper reframes the restoration of 3D fluorescence microscopy (FM) live-cell imaging from an "inverse problem deblurring" perspective to a "tensor completion" perspective. By treating equidistant sparse sampling along the Z-axis as uniform random sampling for low-rank tensor completion, the authors derive a lower bound for the number of observations required for exact recovery. They further prove that the iterative process of solving this completion problem using Tucker decomposition and ADMM is mathematically equivalent to a reverse trajectory of conditional diffusion. This allows for denoised, geometrically coherent 3D cellular reconstruction without training a score network, achieving state-of-the-art performance in PSNR, SSIM, and LPIPS across SR-CACO-2 and three real-world live C. elegans datasets.
- RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
-
RestoreVAR adapts the visual autoregressive model VAR from pure image generation into an all-in-one image restoration model. It utilizes continuous latents of degraded images as cross-attention conditions, further supplemented by a latent refiner and a continuous latent decoder to recover details. It achieves superior restoration quality among generative AiOR methods while reducing the multi-second inference time of LDM-based methods to approximately 0.201 seconds.
- Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
-
Addressing two overlooked bottlenecks of Restormer-style channel attention in All-in-One blind image restoration—the purely linear value path and the lack of explicit global slots—this paper proposes two minimalist, backbone-agnostic primitives (non-linear value transformation + global spatial tokens). These upgrade attention from a "feature selector" to a "selector-transformer" while providing degradation-awareness at nearly zero extra cost, consistently outperforming larger SOTA models across six All-in-One benchmarks.
- SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
-
SeedVR2 compresses a multi-step diffusion-based video restoration model into a one-step generator via diffusion adversarial post-training. It utilizes adaptive window attention, progressive distillation, and discriminator feature matching loss to support high-resolution video restoration, achieving perceptual quality comparable to or better than multi-step models in a single inference step.
- Seeing Through the PRISM: Compound & Controllable Restoration of Scientific Images
-
PRISM combines compound degradation samples, weighted contrastive disentangled CLIP representations, and text-conditional diffusion. This allows scientific images to undergo joint restoration of multiple mixed degradations in a single pass or selective correction based on expert prompts, outperforming existing all-in-one, diffusion, and composite restoration baselines in fidelity metrics, zero-shot compound restoration, and downstream scientific tasks.
- SFBD-OMNI: Bridge Models for Lossy Measurement Restoration with Limited Clean Samples
-
When only massive corrupted measurements and almost no clean samples are available, this work reformulates the task of "restoring the true distribution from a corrupted one" as a one-sided entropic Optimal Transport (OT) problem. By using bridge models for alternating minimization, the proposed SFBD-OMNI framework handles arbitrary black-box corruptions (masking, grayscale, blurring, noise). It proves that restoration is possible with purely noisy samples when the corruption is identifiable; otherwise, as few as 50 clean images can steer the distribution back to the ground truth, significantly outperforming baselines like AmbientGAN, EMDiffusion, and SFBD in FID.
- Sharpness-Aware Machine Unlearning
-
This work systematically analyzes the theoretical properties of SAM in machine unlearning scenarios from the perspective of signal-noise decomposition. It finds that SAM "relinquishes" denoising capabilities on the forget set while maintaining advantages on the retain set. Consequently, the authors propose Sharp MinMax, which splits the model into two parts to perform sharpness minimization (for retention) and sharpness maximization (for forgetting) respectively, achieving SOTA unlearning performance.
- SoFlow: Solution Flow Models for One-Step Generative Modeling
-
The paper proposes Solution Flow Models (SoFlow), which directly learn the solution function \(f(x_t, t, s)\) of a velocity ODE (mapping \(x_t\) at time \(t\) to the solution at time \(s\)). Trained from scratch using a combination of Flow Matching loss and a JVP-free solution consistency loss, it achieves superior 1-NFE FID compared to MeanFlow on ImageNet 256 (XL/2: 2.96 vs 3.43).
- SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
-
SuperF treats multi-frame low-resolution (LR) images as "reconstruction targets" rather than network inputs. It uses a cross-frame shared coordinate MLP (Implicit Neural Representation) to fit the scene on a high-resolution (HR) continuous grid while simultaneously optimizing affine alignment parameters for each frame. This enables multi-image super-resolution (MISR) for satellite and handheld camera bursts under a Test-Time Optimization (TTO) framework without requiring any high-resolution training data, achieving magnification factors up to ×8.
- Taming Hierarchical Image Coding Optimization: A Spectral Regularization Perspective
-
Addressing the contrast where hierarchical learned image compression is "theoretically superior but practically outperformed by single-scale models," this paper analyzes spectral training dynamics. The root causes are identified as cross-scale energy dispersion and spectral aliasing. Two spectral regularizations—intra-scale frequency truncation (gradual specialization from low to high frequencies) and inter-scale latent similarity penalty (suppressing spectral overlap)—are proposed. These are active only during training with zero inference overhead, accelerating training by 2.3× and achieving a 20.65% average bitrate saving relative to VTM-22.0, setting a new SOTA in learned image compression.
- Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework
-
This paper proposes the ADMM-PnP + AC-DC triple-stage denoiser. It employs "Auto-Correction (adding noise) + Directional Correction (Langevin-based)" to pull ADMM iterates back to the noise manifold where the score function was trained, before performing score-based denoising. This stabilizes the embedding of diffusion priors into ADMM with dual variables and provides the first fixed-point convergence proof for this combination, consistently outperforming baselines such as DPS, DiffPIR, and DDRM across various image inverse problems.
- Test-Time Domain Generalization for Image Super-Resolution
-
For pixel-level tasks like Super-Resolution (SR), this paper proposes MC-TTDG: training a set of "domain-invariant codebooks + multiple domain-specific codebooks" on the source domain. During testing, target domain features are migrated via pixel-wise nearest neighbor codeword replacement to achieve fine-grained transfer, and a voting strategy selects the most suitable domain-specific codebook. This significantly improves cross-domain performance without requiring fine-tuning on the target domain.
- Text-Aware Image Restoration with Diffusion Models
-
This paper proposes "Text-Aware Image Restoration (TAIR)," a new task aimed at simultaneously restoring visual appearance and textual content. The authors introduce TeReDiff, a model that embeds a text-spotting module into a diffusion restoration network and jointly trains them using shared diffusion features. Accompanied by the SA-Text dataset containing 100,000 high-quality images with dense text annotations, the method significantly alleviates "text-image hallucination"—the tendency of diffusion restoration models to fabricate plausible but incorrect characters—and achieves a new SOTA on the STISR benchmark TextZoom.
- Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
-
Addressing two major issues in VQ-based generative SR—high quantization error in codebooks and "code-level" supervision for predictors—this paper proposes Texture Vector Quantization (TVQ), which assigns only missing textures to the codebook while stripping away structures, and Reconstruction-Aware Prediction (RAP), which leverages a Straight-Through Estimator to feed image-level reconstruction losses directly back to the index predictor. This achieves SOTA perceptual quality with minimal computational cost (38ms/image).
- Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
-
This paper proposes TS-Mamba, which combines "video trajectory modeling" with "low-complexity Mamba" for online video super-resolution: it first selects the most similar tokens to the current token from historical frames along trajectories, then aggregates them spatio-temporally using a set of "shifted" State Space Model blocks. While maintaining long-range temporal modeling capabilities, it reduces computational complexity (MACs) by over 22.7% compared to existing online VSR methods and achieves SOTA on most test sets.
- Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution
-
Ada-RefSR is proposed as a single-step reference-guided diffusion super-resolution framework based on the "Trust but Verify" principle. It utilizes an Adaptive Implicit Correlation Gating (AICG) mechanism to leverage reliable reference information while suppressing erroneous fusion, with only a 0.13% increase in computational overhead.
- Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
-
This paper replaces the stepwise "Greedy Matching Pursuit" of the zero-shot diffusion compression method DDCM with a closed-form sparse least squares selection rule. By combining hundreds of noise atoms simultaneously in each step, the diffusion steps are reduced by 92%, cutting the round-trip compression-decompression time per image from 65 seconds to 1.8 seconds while maintaining SOTA-level quality and supporting flexible variants such as region-priority and target-PSNR compression.
- UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
-
UniRestorer hierarchically organizes the image degradation space into multi-granularity degradation groups and trains corresponding MoE restoration experts. By jointly routing via degradation estimation and granularity estimation, the universal restoration model leverages fine-grained degradation priors while remaining robust against incorrect degradation estimation.
- VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
-
A pre-trained text-to-image Visual Autoregressive (VAR) model is distilled into a one-step real-world super-resolution model via token-level distribution matching. Combined with a cross-scale pyramid condition to fully utilize low-quality input information, it achieves 72.32 MUSIQ / 0.7669 CLIPIQA on DIV2K-Val by fine-tuning only 1.2% of the parameters, while accelerating inference by approximately 10x.
- Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration
-
Vivid-VR performs generative video restoration by attaching a ControlNet to a pre-trained T2V diffusion Transformer (CogVideoX1.5-5B). It utilizes a "concept distillation" training strategy, where the T2V model synthesizes its own text-aligned training data to suppress distribution drift during fine-tuning. Combined with a lightweight control feature projector and a dual-branch connector, it achieves more realistic textures and robust temporal consistency across real, synthetic, and AIGC videos.