Skip to content

🖼️ Image Restoration

📹 ICCV2025 · 30 paper notes

Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis

This paper addresses the lack of datasets and noise models for polarization image burst super-resolution (SR) by constructing two dedicated datasets—PolarNS (noise statistics) and PolarBurstSR (SR benchmark)—proposing a polarization noise propagation analysis model, and systematically benchmarking existing burst SR methods on polarization scenes, thereby establishing a standardized evaluation framework for polarization image reconstruction.

Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis

To address the hardware bottleneck of polarization cameras—low light efficiency, low spatial resolution, and high noise—this work constructs two dedicated datasets (PolarNS for noise statistics analysis and PolarBurstSR for burst super-resolution training/evaluation), proposes a polarimetric noise propagation analysis model, and adapts five SOTA burst super-resolution methods to the polarization domain. Results demonstrate that polarization-specific training significantly outperforms generic RGB training in reconstructing both intensity maps (s0) and angle of linear polarization (AoLP).

Blind2Sound: Self-Supervised Image Denoising without Residual Noise

This paper proposes the Blind2Sound framework, which perceives noise levels and achieves personalized denoising via an adaptive re-visible loss, complemented by a Cramer Gaussian loss that improves noise parameter estimation accuracy. The framework eliminates residual noise in self-supervised blind denoising and outperforms all contemporary self-supervised methods and even some supervised baselines.

Blind Noisy Image Deblurring Using Residual Guidance Strategy

This paper proposes a Residual Guidance Strategy (RGS) for coarse-to-fine blind image deblurring within an image pyramid framework. At each scale transition, the convolution residual from the adjacent coarser scale is denoised via a guided filter and used to correct the blurred input at the current scale. This approach significantly improves kernel estimation accuracy and restoration quality under high noise levels (σ=0.1), surpassing multiple deep learning methods without requiring any training.

Closed-Loop Transfer for Weakly-supervised Affordance Grounding

This paper proposes LoopTrans, a closed-loop knowledge transfer framework that unifies exocentric and egocentric image activation via a shared CAM module, refines coarse activations into precise localizations using pixel-level pseudo-masks, and feeds egocentric localization results back to enhance exocentric knowledge extraction through denoising distillation, achieving state-of-the-art performance across all metrics on AGD20K.

Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention

GIGA-ToF proposes a ToF depth denoising network that fuses motion-invariant graph structures across frames. Through cross-frame graph attention and algorithm unrolling of a MAP problem, the method simultaneously improves temporal stability and spatial sharpness, demonstrating strong generalization on both synthetic and real data.

CWNet: Causal Wavelet Network for Low-Light Image Enhancement

This paper proposes CWNet, a Causal Wavelet Network that models low-light image enhancement through a structural causal model (SCM), treating semantic information as causal factors and brightness/color degradation as non-causal factors, and employs a wavelet-based backbone for fine-grained frequency-domain feature restoration.

Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion

This paper proposes D²R-UHDNet, a framework that employs a Controlled Differential Disentangled VAE (CD²-VAE) to actively decompose degraded images into a degradation-dominant latent space and background-dominant features, and processes the background features via a complex-domain invertible multi-scale fusion network. The method achieves state-of-the-art performance across six UHD restoration tasks with only 1M parameters.

Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration

Targeting the redundancy caused by uniform subspace allocation across heads in standard Multi-Head Attention (MHA), this paper proposes HINT, which introduces Hierarchical Multi-Head Attention (HMHA) and Query-Key Cache Updating (QKCU) to enhance inter-head diversity and interaction, achieving state-of-the-art results on 12 benchmarks across 5 image restoration tasks.

EAMamba: Efficient All-Around Vision State Space Model for Image Restoration

This paper proposes EAMamba, a framework that introduces a Multi-Head Selective Scan Module (MHSSM) and an all-around scanning strategy to achieve multi-directional scanning without increasing computational complexity or parameter count. EAMamba addresses the computational overhead and local pixel forgetting issues of Vision Mamba in image restoration, achieving 31–89% FLOPs reduction while maintaining competitive performance across super-resolution, denoising, deblurring, and dehazing tasks.

Efficient Concertormer for Image Deblurring and Beyond

This paper proposes Concertormer, which decomposes self-attention into a global Concertino component and a local Ripieno component, and further introduces a Cross-Dimensional Communication module and a Gated Depthwise Convolution MLP. The method achieves global-local feature modeling at linear complexity, attaining state-of-the-art performance on image deblurring and other restoration tasks.

Emulating Self-Attention with Convolution for Efficient Image Super-Resolution

Motivated by the observation that features and attention maps across adjacent self-attention layers exhibit high inter-layer similarity (89%/87%), this paper proposes ConvAttn — a module composed of a shared large-kernel convolution and a dynamic convolution kernel — to replace the majority of self-attention layers. Flash Attention is introduced into lightweight SR for the first time, extending the window size to \(32 \times 32\), achieving state-of-the-art performance at minimal latency and memory cost.

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

This paper systematically investigates the impact of Translation Equivariance (TE) on the convergence speed and generalization ability of image restoration networks. It proposes Sliding Key-Value Self-Attention (SkvSA), its adaptive variant (ASkvSA), and Downsampled Self-Attention (DSA), and constructs TEAFormer, which achieves state-of-the-art performance on super-resolution, deblurring, denoising, and other tasks while maintaining linear complexity.

Exploiting Diffusion Prior for Task-driven Image Restoration

This paper proposes EDTR, a method that leverages diffusion model priors via a pre-restoration + partial diffusion strategy combined with short-step denoising to effectively recover task-relevant details, achieving significant gains in classification, segmentation, and detection under complex degradation scenarios.

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

This work constructs the first million-scale real-world paired image restoration dataset covering 20 degradation types, and proposes the FoundIR framework, which combines a degradation-agnostic generalist model with degradation-aware expert models to surpass existing performance ceilings across 24 benchmarks.

Generic Event Boundary Detection via Denoising Diffusion (DiffGEBD)

DiffGEBD is the first work to introduce diffusion models into Generic Event Boundary Detection (GEBD). It frames boundary prediction as an iterative denoising process from random noise to a plausible boundary distribution, leverages Classifier-Free Guidance to control prediction diversity, and proposes two new evaluation metrics—Symmetric F1 and Diversity Score—to measure quality and diversity in multi-prediction scenarios.

IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution

This paper proposes IM-LUT, which achieves arbitrary-scale image super-resolution by learning to mix multiple interpolation functions, and converts the prediction network into a look-up table form to enable lightweight, fast CPU inference while maintaining reconstruction quality.

Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement

This paper proposes the BPAM framework, which combines the spatial modeling capability of bilateral grids with the nonlinear mapping power of MLPs by dynamically generating unique micro-MLP parameters for each pixel, enabling high-quality, real-time image enhancement.

Lightweight and Fast Real-time Image Enhancement via Decomposition of the Spatial-aware Lookup Tables

By decomposing 3D LUTs into linear combinations of 2D LUTs followed by SVD, and adopting a cache-efficient spatial feature fusion structure, the proposed method achieves spatially-aware image enhancement while reducing model parameters by 84% and accelerating 4K inference by 2.8×.

Low-Light Image Enhancement using Event-Based Illumination Estimation (RetinEV)

RetinEV proposes exploiting temporal-mapping events (triggered by transmittance modulation) rather than conventional motion events for illumination estimation. Combined with Retinex theory, it decomposes low-light images into illumination and reflectance components, and employs an Illumination-guided Reflectance Enhancement (IRE) module to achieve high-quality low-light image enhancement, reaching real-time inference at 35.6 FPS on 640×480 images.

Metric Convolutions: A Unifying Theory to Adaptive Image Convolutions

This paper proposes a metric-geometric perspective that unifies existing adaptive convolution variants (standard, dilated, shifted, and deformable), and introduces Metric Convolution based on unit-ball sampling of an explicit Randers metric, achieving superior geometric regularization and generalization with substantially fewer parameters.

MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices

This paper proposes MobileIE, an extremely lightweight CNN framework with approximately 4K parameters, which achieves real-time image enhancement at over 1100 FPS on mobile devices for the first time. This is accomplished through multi-branch re-parameterizable convolution (MBRConv), a feature self-transformation (FST) module, hierarchical dual-path attention (HDPA), and an incremental weight optimization (IWO) strategy. MobileIE achieves state-of-the-art speed–performance trade-offs across three tasks: low-light enhancement, underwater enhancement, and ISP.

MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration

This paper proposes MP-HSIR, a unified hyperspectral image restoration framework that integrates three modalities of guidance—spectral prompts (universal low-rank spectral patterns), text prompts, and visual prompts—to comprehensively outperform existing all-in-one methods and numerous task-specific methods across 9 HSI restoration tasks, including denoising, deblurring, super-resolution, inpainting, dehazing, and band completion.

Outlier-Aware Post-Training Quantization for Image Super-Resolution

This paper proposes an outlier-aware post-training quantization method for image super-resolution. It introduces a dual-region piecewise linear quantizer to balance outlier preservation with normal activation fidelity, and incorporates a sensitivity-aware finetuning strategy that directs attention to quantization-sensitive layers. Under the W4A4 setting, the method substantially outperforms existing PTQ approaches and approaches QAT-level performance.

PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining

The first point-based event camera deraining framework, leveraging 4D event cloud representation and a Multi-Scale State Space Model (MS3M) to achieve efficient deraining while preserving microsecond-level temporal precision, reaching state-of-the-art performance with only 0.26M parameters.

Robust Adverse Weather Removal via Spectral-based Spatial Grouping (SSGformer)

SSGformer proposes an All-in-One adverse weather image restoration method based on spectral decomposition and grouping attention: it extracts high-frequency edge information via the Sobel operator and analyzes low-frequency degradation textures via SVD, fuses both to generate spatial grouping masks, and performs channel and spatial attention within groups to achieve robust removal of multiple weather degradations (rain, snow, haze, raindrops).

Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising

This paper proposes Noise2VST, a framework that learns a model-free variance-stabilizing transformation (VST) via self-supervised learning, enabling off-the-shelf Gaussian denoisers to handle real-world noisy images without any additional training.

Towards a Universal Image Degradation Model via Content-Degradation Disentanglement

This paper proposes the first universal image degradation model. Through a disentangle-by-compression approach, it separates degradation information from image content, introduces IDEN and IDA layers to handle inhomogeneous degradation, and enables cross-degradation encoding, synthesis, and transfer. The model can serve as a plug-in module to convert non-blind image restoration methods into blind ones.

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

This paper proposes UniPhys, a behavior cloning framework based on diffusion models that unifies motion planning and physics-based control within a single model. By adopting the Diffusion Forcing training paradigm to address compounding prediction errors, UniPhys enables flexible multi-task physics-based character motion generation, including text-driven control, velocity control, goal reaching, and dynamic obstacle avoidance.

UniRes: Universal Image Restoration for Complex Degradations

This paper proposes UniRes — a diffusion-based universal image restoration framework that acquires expert knowledge across four tasks (super-resolution, motion deblurring, defocus deblurring, and denoising) through multi-task training. At inference time, it handles arbitrary combinations of real-world complex degradations end-to-end by flexibly composing latent-space prediction weights from different tasks.