🖼️ Image Restoration¶

🎞️ ECCV2024 · 32 paper notes

📌 Same area in other venues: 📷 CVPR2026 (135) · 🔬 ICLR2026 (61) · 🧪 ICML2026 (21) · 🤖 AAAI2026 (10) · 🧠 NeurIPS2025 (26) · 📹 ICCV2025 (31)

🔥 Top topics: Image Restoration ×12 · Super-Resolution ×8 · Diffusion Models ×3 · Adversarial Robustness ×2

A New Dataset and Framework for Real-World Blurred Images Super-Resolution: Addressing the issue where existing blind super-resolution methods over-texturize and destroy the perceptual quality of blurred regions when processing images with blur (defocus/motion blur), this work constructs the ReBlurSR dataset containing nearly 3,000 blurred images. It proposes the PBaSR framework, which employs Cross-Disentanglement training (CDM) and weight-interpolation-based Cross-Fusion (CFM) to simultaneously improve the super-resolution quality of both blurred and general images without introducing any additional inference overhead, improving LPIPS by 0.02 to 0.10.
Accelerating Image Super-Resolution Networks with Pixel-Level Classification: This and paper introduces PCSR, the first super-resolution method with pixel-level computational resource allocation. By leveraging a lightweight MLP classifier, it determines the restoration difficulty on a pixel-by-pixel basis and assigns them to upsamplers of varying capacities. PCSR reduces FLOPs to \(18\% \sim 57\%\) of the original models with almost no drop in PSNR, significantly outperforming existing patch-level methods like ClassSR and ARM.
Asymmetric Mask Scheme for Self-supervised Real Image Denoising: Proposed the asymmetric mask scheme AMSNet, which utilizes a single mask during training and complementary multiple masks during inference, breaking the structural requirements and receptive field constraints of blind spot networks, and achieving SOTA performance in self-supervised real image denoising.
BAMM: Bidirectional Autoregressive Motion Model: BAMM (Bidirectional Autoregressive Motion Model) is proposed. By unifying generative masked modeling and autoregressive modeling through a hybrid attention masking strategy, it simultaneously achieves high-quality motion generation, adaptive length prediction, and zero-shot motion editing within a single framework, comprehensively outperforming SOTA on HumanML3D and KIT-ML.
Blind Image Deblurring with Noise-Robust Kernel Estimation: This paper proposes a blind deblurring method based on a noise-robust kernel estimation function and deep image prior (DIP). By designing a kernel estimation function capable of accurately estimating blur kernels even under strong noise, combined with a multiple-kernel estimation scheme to handle unknown noise levels, it achieves superior deblurring performance on both simulated and real images.
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion: Proposes BrushNet, a plug-and-play dual-branch diffusion model architecture for image inpainting. By decoupling masked image feature extraction and image generation into separate branches, it achieves layer-wise pixel-level feature injection, thoroughly outperforming existing methods in image quality, masked area preservation, and text alignment.
Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution: To address the unique challenges of infrared image super-resolution, this paper proposes the CoRPLE framework. It utilizes the Contourlet transform for multi-scale and multi-directional infrared spectral residual enhancement, and introduces a prompt learning paradigm based on vision-language models to capture the inherent features of infrared images, achieving SOTA performance on infrared SR tasks.
DenoiSplit: A Method for Joint Microscopy Image Splitting and Unsupervised Denoising: This paper proposes DenoiSplit, the first method to jointly address semantic image splitting and unsupervised denoising. By integrating pixel noise models and an improved KL divergence loss weighting strategy into a hierarchical VAE, the method achieves end-to-end denoising and splitting on fluorescence microscopy images, significantly outperforming serial pipelines that perform denoising prior to splitting.
Domain-Adaptive Video Deblurring via Test-Time Blurring: A test-time domain adaptation method based on a diffusion blur model is proposed. By detecting relatively sharp regions from blurry videos as pseudo-sharp images and generating domain-adaptive blur conditions to synthesize training pairs, the method enables fine-tuning of deblurring models on unseen domains, achieving a maximum gain of 7.54dB across 5 real-world datasets.
EDformer: Transformer-Based Event Denoising Across Varied Noise Levels: EDformer proposes an event-by-event denoising model based on Transformer, which handles event camera noise under varied noise levels by learning spatiotemporal correlations among events. It also establishes ED24, the first real-world event denoising dataset containing 21 noise levels.
Efficient Cascaded Multiscale Adaptive Network for Image Restoration: ECMA proposes an efficient cascaded multiscale adaptive network that dynamically adjusts convolutional kernels via the Local Adaptive Module (LAM) to handle spatially-varying degradations, and captures features at different scales in a cascaded multiscale manner. It achieves comparable or even superior performance to SOTA on various image restoration tasks (including deblurring, denoising, and super-resolution) with a 1.2×-9.7× reduction in computational cost.
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators: Discovering that query-key interactions in Diffusion Transformers exhibit significant redundancy (especially in the early stages of denoising), this work proposes the Attention Mediator mechanism to reduce attention complexity to linear. It further designs a step-wise dynamic adjustment strategy, achieving a state-of-the-art FID of 2.01 on SiT-XL/2 while reducing computational overhead.
Exploiting Dual-Correlation for Multi-frame Time-of-Flight Denoising: Proposal of the first learning-based multi-frame ToF depth denoising framework, which effectively utilizes the correlation between multi-frame ToF data to guide noise removal via a Dual-Correlation Estimation Module (exploiting intra-frame and inter-frame correlation) and a Confidence-guided Residual Regression Module, significantly outperforming existing single-frame methods in high-noise regions.
Image Demoiréing in RAW and sRGB Domains: This paper proposes the RRID framework to jointly utilize RAW and sRGB dual-domain data for image demoiréing. It designs the SCDM demoiréing module equipped with GFM (Gated Feedback Module) and FSM (Frequency Selection Module), along with RGISP to implement device-specific ISP learning for color restoration assistance, outperforming the state-of-the-art (SOTA) by 0.62dB in PSNR.
Intrinsic Single-Image HDR Reconstruction: Proposes an HDR reconstruction method based on intrinsic image decomposition, which reformulates the problem into two sub-tasks: dynamic range expansion in the shading domain and color recovery in the albedo domain, training separate networks to improve reconstruction quality.
Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography: This paper proposes JDM-HDRNet, which extracts shading, reflectance, and material semantic priors from low-resolution multispectral images (Lr-MSI) using a joint RGB-spectral decomposition model. These priors are integrated into HDRNet to enhance dynamic range, color mapping, and semantic bilateral grid expert learning, respectively. Additionally, the first paired RGB-hyperspectral Mobile-Spec dataset is constructed.
Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence: This paper proposes the Exhaustive Correlation Transformer (ECT), which models unified spatial-spectral correlation via a spectral-direction discontinuous 3D splitting strategy (SD3D) and captures linear dependencies among multiple tokens using a Dynamic Low-Rank Mapping (DLRM) module. It achieves SOTA performance on spectral super-resolution tasks with minimal parameter overhead and the lowest inference latency.
Learning to Robustly Reconstruct Dynamic Scenes from Low-Light Spike Streams: To address the reconstruction difficulties caused by sparse information of spike cameras in low-light environments, this paper proposes a bidirectional recurrent reconstruction framework. Its core is a light-robust representation (LR-Rep) that aggregates temporal information through global spike interval (GISI), combined with a feature fusion module to extract temporal features. The paper also constructs a dedicated low-light high-speed dataset, substantially outperforming existing methods on both synthetic and real-world data.
MambaIR: A Simple Baseline for Image Restoration with State-Space Model: This paper introduces Mamba (Selective State-Space Model) to low-level image restoration tasks for the first time. By designing local convolution enhancement and channel attention mechanisms within the Residual State-Space Block (RSSB), the proposed method addresses the issues of local pixel forgetting and channel redundancy in vanilla Mamba on 2D images. It achieves comparable or even superior performance to Transformer-based methods with linear complexity on image super-resolution and denoising tasks (outperforming SwinIR by 0.45dB on SR).
MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration: This paper proposes MoE-DiffIR, the first diffusion-based universal compressed image restoration (CIR) framework. It extracts task-customized diffusion priors from Stable Diffusion via a Mixture-of-Experts (MoE) Prompt module, leverages a Visual-to-Text adapter to activate SD's cross-modal generative priors, and constructs the first universal CIR benchmark dataset covering 21 degradation types (7 codecs × 3 compression levels).
OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal: To address the double JPEG compressed image restoration problem, OAPT is proposed. By predicting the pixel offset between the two compressions, four different patterns in each 8×8 block are clustered and grouped for separate self-attention processing, outperforming the state-of-the-art methods by 0.16 dB on the double JPEG restoration task.
Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks: This paper proposes the ODM framework. By employing two simple strategies—cooperative mismatch regularization and layer-wise weight clipping correction—it resolves the distribution mismatch problem in SR network quantization without introducing dynamic modules during inference, achieving state-of-the-art (SOTA) performance with minimal extra overhead.
Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution: This paper proposes a pairwise distance distillation framework that achieves degradation adaptation for unsupervised real-world image super-resolution by distilling the intra- and inter-model distance relationships between a specialized model and a generalized model.
Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal: The authors propose Raindrop Clarity, a large-scale real-world raindrop removal dataset containing 15,186 high-quality image pairs/triplets. For the first time, it covers raindrop-focused (clear raindrops with blurred background) and nighttime raindrop scenarios, both of which are missing from existing datasets.
Restoring Images in Adverse Weather Conditions via Histogram Transformer: Proposed Histoformer, an efficient Transformer based on histogram self-attention. By sorting and binning spatial features according to pixel intensity, it performs self-attention within and across bins to establish dynamic-range spatial attention for efficiently processing weather-degraded pixels. Combined with dynamic-range convolution and Pearson correlation loss, it achieves a unified modeling and reaches SOTA performance on three major tasks: desnowing, deraining/dehazing, and deraindropping.
Rethinking Image Super-Resolution from Training Data Perspectives: This paper rethinks image super-resolution (SR) from the perspective of training data. It proposes an automated data evaluation pipeline to construct the DiverSeg dataset, which features low-resolution but high-quality and object-diverse images. The authors demonstrate that SR models trained on this dataset can outperform those trained on traditional high-resolution datasets (such as DF2K and LSDIR).
Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration: This paper proposes FPro, guiding image restoration via prompting from a frequency-domain perspective. Employing a Gated Dynamic Decoupler to decouple features into low-frequency and high-frequency components, the method injects learnable prompts into both bands using a Dual Prompt Block (HPM + LPM) to interact with decoder features. It outperforms state-of-the-art methods comprehensively across five tasks: deraining, raindrop removal, demoireing, deblurring, and dehazing.
Spatially-Variant Degradation Model for Dataset-free Super-resolution: Proposing SVDSR, the first dataset-free spatially-variant degradation model. The degradation kernel of each pixel is represented as a linear combination of a learnable atomic kernel dictionary. The coefficient matrix is derived from image texture information via membership functions in fuzzy set theory, and inferred under the MAP framework using the Monte Carlo EM algorithm. It achieves an average improvement of around 1 dB in \(2\times\) super-resolution.
Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint: This paper proposes T3-DiffWeather, a diffusion-based all-in-one adverse weather restoration framework. It utilizes a prompt pool to allow the network to autonomously combine sub-prompts to construct instance-level weather-prompts for modeling diverse weather degradations. Concurrently, it leverages Depth-Anything features to constrain general prompts to model scene information. The method achieves state-of-the-art (SOTA) performance in only 2 sampling steps, with a computational cost of only 1/52 of WeatherDiffusion.
Towards Real-world Event-guided Low-light Video Enhancement and Deblurring: This paper introduces the joint task of event-guided low-light video enhancement and deblurring for the first time. It constructs a beam-splitter-based real-world dataset, RELED, and designs an end-to-end framework consisting of two core modules: Event-Guided Deformable Temporal Feature Alignment (ED-TFA) and Spectrum Frequency-based Cross-Modal Feature Enhancement (SFCM-FE), outperforming previous state-of-the-art methods by over 1.2 dB in PSNR.
TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts: This paper proposes TTT-MIM, which jointly optimizes a supervised denoising loss and a self-supervised masked image modeling (MIM) loss during the training phase. At test time, adaptive fine-tuning on a single noisy image is performed by minimizing the self-supervised MIM loss. This significantly improves denoising performance against out-of-distribution noise (such as real camera noise and microscope noise) while being far faster than zero-shot methods.
Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement: This paper proposes UDU-Net, which models low-light video enhancement as a MAP optimization problem and unrolls it into a deep network. It processes spatial (illumination) and temporal (consistency) degradations through Intra/Inter sub-networks respectively, supporting unpaired training and controllable enhancement guided by human perceptual feedback.