🖼️ Image Restoration¶

📷 CVPR2026 · 47 paper notes

Beyond Ground-Truth: Leveraging Image Quality Priors for Real-World Image Restoration: This paper proposes IQPIR, a framework that introduces image quality priors (IQP) derived from pretrained NR-IQA models as conditioning signals. Through three mechanisms—quality-conditioned Transformer, dual Codebook architecture, and quality optimization in discrete representation space—IQPIR guides the restoration process toward maximal perceptual quality, achieving state-of-the-art performance on blind face restoration and related tasks.
Beyond the Ground Truth: Enhanced Supervision for Image Restoration: This paper proposes to enhance the perceptual quality of suboptimal ground-truth images in existing datasets via super-resolution combined with frequency-domain adaptive mixing, and trains a lightweight Output Refinement Network (ORNet) that improves the perceptual quality of restoration outputs without modifying any pretrained restoration model.
BHCast: Unlocking Black Hole Plasma Dynamics from a Single Blurry Image with Long-Term Forecasting: BHCast takes a single blurry EHT black hole image as input, employs a U-Net dynamics surrogate model for super-resolution combined with long-term autoregressive forecasting (stable over 100 steps), extracts physical features (pattern speed, pitch angle, etc.) from the predicted plasma dynamics, and infers black hole spin and inclination via XGBoost. Effectiveness is also demonstrated on real M87* observational images.
Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding: This paper proposes Blink, a framework that dynamically expands and discards visual tokens across different Transformer layers of an MLLM — simulating the human "rapid blinking" scanning process — to adaptively enhance visual perception within a single forward pass, improving LLaVA-1.5 performance across multiple multimodal benchmarks.
BluRef: Unsupervised Image Deblurring with Dense-Matching References: BluRef is proposed as the first unsupervised framework that leverages unpaired reference sharp images to generate pseudo ground truth via dense matching for training a deblurring network, achieving performance comparable to or even surpassing supervised methods.
Bridging the Perception Gap in Image Super-Resolution Evaluation: Through a large-scale user study, this paper reveals a severe misalignment between existing SR evaluation metrics (PSNR, SSIM, LPIPS, etc.) and human perception. After analyzing their inherent deficiencies, the paper proposes a minimalist yet effective Relative Quality Index (RQI) framework that learns relative quality differences between image pairs to enable more reliable SR evaluation, and can also serve as a loss function to guide SR model training.
PNG: Diffusion-Based sRGB Real Noise Generation via Prompt-Driven Noise Representation Learning: PNG introduces learnable Global/Local Prompt components to automatically extract noise characteristics from real noise (replacing metadata such as ISO and camera model). A Prompt AutoEncoder encodes noise into a latent space, and a Prompt DiT (based on a consistency model) generates latent codes in a single step, enabling realistic sRGB noise synthesis without any metadata. The downstream DnCNN denoiser trained on PNG-synthesized data trails real-data training by only 0.08 dB on SIDD.
Disentangled Textual Priors for Diffusion-based Image Super-Resolution: This paper proposes DTPSR, which disentangles textual priors along two orthogonal dimensions — spatial hierarchy (global/local) and frequency semantics (low-frequency/high-frequency) — and constructs a disentangled cross-attention injection pipeline along with a multi-branch CFG strategy, achieving superior perceptual quality in diffusion-based image super-resolution.
DRFusion: Degradation-Robust Fusion via Degradation-Aware Diffusion Framework: This paper proposes DRFusion, a degradation-aware diffusion framework that achieves multimodal image fusion under arbitrary degradation scenarios within a small number of diffusion steps, via direct regression of the fused image (rather than explicit noise prediction) and a joint observation model correction mechanism.
EVLF: Early Vision-Language Fusion for Generative Dataset Distillation: This paper proposes EVLF, a plug-and-play early vision-language fusion method operating at the encoder-backbone interface, addressing the problem of text dominance and degraded visual fidelity caused by late-stage semantic injection in diffusion-based dataset distillation.
FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution: This paper proposes FiDeSR, a high-fidelity and detail-preserving one-step diffusion super-resolution framework that simultaneously addresses structural fidelity degradation and insufficient high-frequency detail recovery in one-step diffusion SR through three complementary components: Detail-Aware Weighting (DAW), Latent Residual Refinement Block (LRRB), and Latent Frequency Injection Module (LFIM).
FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution: This paper proposes FinPercep-RM, a fine-grained perceptual reward model, and a Co-evolutionary Curriculum Learning (CCL) strategy to address reward hacking and training instability when applying RLHF to real-world image super-resolution. The model simultaneously outputs a global quality score and a spatial degradation heatmap, enabling localized artifact awareness.
FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution: This paper proposes FinPercep-RM, a fine-grained perceptual reward model that predicts both a global quality score and a perceptual degradation map to spatially localize artifacts. Combined with a co-evolutionary curriculum learning (CCL) strategy that balances training stability and reward robustness, the method effectively mitigates reward hacking in RL-based real-world super-resolution.
GSNR: Graph Smooth Null-Space Representation for Inverse Problems: This paper proposes Graph Smooth Null-Space Representation (GSNR), which employs spectral graph theory to construct a null-space-constrained Laplacian matrix and selects the \(p\) smoothest spectral modes as the null-space projection basis. GSNR provides structured null-space constraints for inverse problem solvers including PnP, DIP, and diffusion models, achieving up to 4.3 dB PSNR gains on deblurring, compressed sensing, demosaicing, and super-resolution.
IA-CLAHE: Image-Adaptive Clip Limit Estimation for CLAHE: IA-CLAHE demonstrates that the histogram redistribution process in CLAHE is differentiable almost everywhere, enabling the first end-to-end learning framework for tile-adaptive clip limit estimation. Without requiring pre-searched ground-truth clip limits, it achieves zero-shot improvements in recognition performance and visual quality under adverse weather conditions.
Flickerformer: A Duet of Periodicity and Directionality for Burst Flicker Removal: This paper identifies two intrinsic physical properties of flicker artifacts—periodicity and directionality—and proposes Flickerformer, comprising three dedicated modules (PFM/AFFN/WDAM) for inter-frame/intra-frame periodicity and directionality modeling respectively. With only 3.92M parameters, the method achieves 31.226 dB PSNR on the BurstDeflicker benchmark, surpassing the second-best method AST by +0.580 dB while using only 19.70% of its parameters.
Learning to Translate Noise for Robust Image Denoising: This paper proposes a noise translation framework that converts unknown real-world noise into Gaussian noise via a lightweight noise translation network (NTN), which is then processed by a pre-trained Gaussian denoising network. The approach achieves an average PSNR gain of over 1.5 dB on OOD real-noise benchmarks, while the translation network contains only 0.29M parameters and is transferable across different denoisers.
MAD-Avatar: Motion-Aware Animatable Gaussian Avatars Deblurring: The first method to directly reconstruct sharp, drivable 3D Gaussian human avatars from blurry video: proposes a 3D-aware physical blur formation model (decomposing blur into sub-frame SMPL motion and canonical 3DGS), models sub-frame motion via B-spline interpolation and a pose deformation network, resolves motion direction ambiguity with inter-frame regularization, and substantially outperforms two-stage "2D deblurring + 3DGS" pipelines on both synthetic and real datasets (~2.5 dB PSNR gain).
NEC-Diff: Noise-Robust Event–RAW Complementary Diffusion for Seeing Motion in Extreme Darkness: This paper proposes NEC-Diff, a diffusion-based event–RAW hybrid imaging framework that uses the illumination prior from RAW images to guide event denoising, and leverages the high-dynamic-range edges from denoised events to assist image denoising. Combined with dual-modality SNR-guided reliable information extraction and cross-modal attention diffusion, the method achieves high-quality dynamic scene reconstruction in extreme darkness (0.001–0.8 lux), reaching 24.51 dB PSNR on the REAL dataset.
NTIRE 2026 The 3rd RAIM Challenge: AI Flash Portrait (Track 3): NTIRE 2026 3rd RAIM Challenge AI Flash Portrait Track: mapping weak-flash low-light portraits to strong-flash professional-grade portraits, providing 800 real paired samples (with professional retoucher GT), adopting a dual evaluation system combining region-aware objective metrics and expert blind assessment. 118 teams registered with 3,187 valid submissions.
NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: This is the summary report of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Based on the Raindrop Clarity real-world dataset (14,139 training / 407 validation / 593 test images), 168 teams participated and 17 submitted valid solutions. The winning team AIIA-Lab achieved the best score of 35.24 using an MSDT backbone combined with a pseudo-GT refinement pipeline.
PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors: PhaSR introduces a dual-level physically aligned prior framework: at the global level, PAN performs parameter-free Retinex decomposition to suppress color bias; at the local level, GSRA employs differential attention to align DepthAnything depth priors with DINO-v2 semantic embeddings. This enables generalized shadow removal spanning from single-source direct illumination to multi-source ambient lighting scenes, achieving state-of-the-art performance on WSRD+ and Ambient6K with the lowest FLOPs.
POLISH'ing the Sky: Wide-Field and High-Dynamic Range Interferometric Image Reconstruction: Building upon the POLISH framework, this work proposes POLISH+ and POLISH++, which employ a patch-based training-and-stitching strategy and an arcsinh-based nonlinear transform to achieve radio interferometric image reconstruction and super-resolution under wide-field (12,960×12,960 pixels) and high-dynamic-range (\(\sim 10^6\)) conditions. The paper also presents the first demonstration that deep learning methods can super-resolve strong gravitational lens systems.
RAR: Restore, Assess, Repeat - A Unified Framework for Iterative Image Restoration: RAR deeply integrates image quality assessment (IQA) with image restoration (IR) into a unified end-to-end model, iteratively executing an "assess–restore–verify" loop in the latent space. It achieves a +2.71 dB PSNR gain under composite degradation scenarios while running 11.27× faster than AgenticIR.
RAW-Domain Degradation Models for Realistic Smartphone Super-Resolution: This paper proposes a calibration-based RAW-domain degradation modeling framework that accurately calibrates SR blur kernels and sensor noise models for multiple smartphone cameras, enabling the "unprocessing" of public sRGB images into realistic LR RAW data for training. The approach significantly outperforms baselines based on generic degradation pools in both camera-specific and cross-camera blind super-resolution settings.
RAW-Domain Degradation Models for Realistic Smartphone Super-Resolution: This paper demonstrates that principled, device-specific degradation modeling — obtained via physical calibration of real blur and noise parameters — significantly improves real-world smartphone super-resolution performance. By unprocessing publicly available rendered images into the RAW domain of target devices to generate HR-LR training pairs, the resulting SR models substantially outperform baselines trained with large pools of arbitrary degradation combinations on held-out real device data.
Toward Real-world Infrared Image Super-Resolution: A Unified Autoregressive Framework and Benchmark Dataset: This paper proposes Real-IISR, a unified autoregressive framework that addresses the unique challenges of real-world infrared image super-resolution via a Thermal-Structure Guidance (TSG) module, a Conditional Adaptive Codebook (CAC), and a Thermal Order Consistency loss. It also introduces the FLIR-IISR dataset comprising 1,457 real LR-HR infrared image pairs.
SAT: Selective Aggregation Transformer for Image Super-Resolution: This paper proposes the Selective Aggregation Transformer (SAT), which reduces Key-Value matrix token count by 97% through density-driven token aggregation while preserving full-resolution Queries, enabling efficient global attention modeling. SAT surpasses the state-of-the-art PFT by 0.22 dB while reducing FLOPs by 27%.
SelfHVD: Self-Supervised Handheld Video Deblurring: SelfHVD exploits naturally occurring sharp frames in handheld videos as supervisory signals. Through Self-Enhanced Video Deblurring (SEVD), it constructs high-quality training pairs that surpass the quality ceiling of sharp frames, while Self-Constrained Spatial Consistency Maintenance (SCSCM) prevents spatial displacement drift, enabling handheld video deblurring without paired training data.
Winner of CVPR2026 NTIRE Challenge on Image Shadow Removal: Semantic and Geometric Guidance for Shadow Removal via Cascaded Refinement: A three-stage cascaded refinement pipeline built upon OmniSR, combining frozen DINOv2 semantic features with monocular depth/normal geometric guidance and a contraction constraint loss to stabilize multi-stage training, achieving first place in the NTIRE 2026 Image Shadow Removal Challenge.
ShiftLUT: Spatial Shift Enhanced Look-Up Tables for Efficient Image Restoration: ShiftLUT is proposed to achieve the largest receptive field among LUT-based methods (65×65) via a Learnable Spatial Shift module (LSS), combined with an asymmetric dual-branch architecture and Error-bounded Adaptive Sampling (EAS). Under a storage budget of 104 KB and inference latency of 84 ms, ShiftLUT surpasses all existing LUT-based methods.
Spectral Super-Resolution via Adversarial Unfolding and Data-Driven Spectrum Regularization: This paper proposes UALNet, which integrates a data-driven spectral prior (PriorNet) and an adversarial learning term into a deep unfolding framework to perform spectral super-resolution from Sentinel-2 multispectral data (12 bands) to NASA AVIRIS hyperspectral imagery (186 bands), surpassing Transformer-based methods while requiring only 15% of their computation and 1/20 of their parameters.
Statistical Characteristic-Guided Denoising for Rapid High-Resolution Transmission Electron Microscopy Imaging: This paper proposes SCGN (Statistical Characteristic-Guided denoising Network), which adaptively enhances signal and suppresses noise in both spatial and frequency domains via window standard deviation weighting and frequency band-guided channel attention, respectively. Combined with an HRTEM-specific noise calibration method that generates realistic noisy datasets containing disordered structures, SCGN achieves high-quality denoising of high-resolution transmission electron microscopy images at millisecond-level acquisition speeds.
The Surprising Effectiveness of Noise Pretraining for Implicit Neural Representations: Through systematic experimental analysis, this paper demonstrates that pretraining INRs on unstructured noise (uniform/Gaussian distributions) achieves a surprising ~80 dB PSNR in image fitting, far surpassing all data-driven initialization methods. Noise with the natural image \(1/|f^\alpha|\) spectral structure achieves the best balance between signal fitting and denoising, matching state-of-the-art data-driven initialization performance without requiring any real data.
TM-BSN: Triangular-Masked Blind-Spot Network for Real-World Self-Supervised Image Denoising: This paper proposes TM-BSN, a triangular-masked blind-spot network that designs the blind-spot region to precisely align with the diamond-shaped spatial correlation pattern of real-world sRGB noise, enabling self-supervised image denoising at full resolution without downsampling. Combined with knowledge distillation, TM-BSN achieves state-of-the-art self-supervised denoising performance on the SIDD and DND benchmarks.
Toward Real-world Infrared Image Super-Resolution: A Unified Autoregressive Framework and Benchmark Dataset: This paper proposes Real-IISR, a visual autoregressive framework guided by thermal-structural cues, which achieves real-world infrared image super-resolution via a conditionally adaptive codebook and a thermal order consistency loss. The first real-world infrared SR dataset, FLIR-IISR, is also introduced.
Towards Universal Computational Aberration Correction in Photographic Cameras: A Comprehensive Benchmark Analysis: This paper constructs UniCAC, the first universal computational aberration correction benchmark for consumer-grade cameras, proposes an Optical Degradation Evaluator (ODE) to quantify aberration difficulty, systematically evaluates 24 image restoration/CAC methods, and reveals three key factors influencing CAC performance.
UCAN: Unified Convolutional Attention Network for Expansive Receptive Fields in Lightweight Super-Resolution: UCAN is a lightweight super-resolution network that unifies convolutional and attention mechanisms to efficiently expand the effective receptive field. It addresses the rank collapse issue of linear attention via Hedgehog attention, introduces a large-kernel distillation module and a semi-shared parameter strategy, and achieves 31.63 dB PSNR on Manga109 (×4) with only 48.4G MACs.
UCAN: Unified Convolutional Attention Network for Expansive Receptive Fields in Lightweight Super-Resolution: This paper proposes UCAN, a lightweight super-resolution network that unifies convolution and attention. By introducing Hedgehog Attention to overcome the low-rank bottleneck of linear attention, and combining Flash Attention for large-window modeling, a large-kernel distillation module, and cross-layer parameter sharing, UCAN achieves super-resolution performance comparable to much larger models under extremely low computational budgets.
UDAPose: Unsupervised Domain Adaptation for Low-Light Human Pose Estimation: UDAPose achieves a 56.4% AP improvement on the low-light hard set by combining stable diffusion-based low-light image synthesis (with preserved high-frequency low-light characteristics) and a dynamic attention control module (adaptively balancing visual cues and pose priors).
UniBlendNet: Unified Global, Multi-Scale, and Region-Adaptive Modeling for Ambient Lighting Normalization: This paper proposes UniBlendNet, which builds upon IFBlend to unify three complementary modules—global context modeling, multi-scale feature aggregation, and region-adaptive residual refinement—for ambient lighting normalization under complex spatially varying illumination conditions.
UniCAC: Towards Universal Computational Aberration Correction in Photographic Cameras: This work constructs UniCAC, the first large-scale universal benchmark for computational aberration correction (CAC) in photographic lenses covering both spherical and aspherical designs. It proposes an Optical Degradation Evaluator (ODE) to replace the traditional RMS radius metric, and derives three key factors governing CAC performance—prior utilization, network architecture, and training strategy—through a comprehensive evaluation of 24 models.
Towards Universal Computational Aberration Correction in Photographic Cameras: A Comprehensive Benchmark Analysis: This paper presents UniCAC, the first large-scale universal benchmark for Computational Aberration Correction (CAC). It introduces an Optical Degradation Evaluator (ODE) to quantify aberration difficulty and comprehensively evaluates 24 image restoration/CAC algorithms, revealing the impact of three key factors—prior utilization, network architecture, and training strategy—on CAC performance.
UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization: This paper proposes UniRain, a unified image deraining framework that employs RAG-driven dataset distillation to select high-quality samples from million-scale public datasets, combined with an asymmetric MoE architecture and a multi-objective reweighted optimization strategy, achieving consistently superior performance across four degradation types: rain streaks and raindrops under both daytime and nighttime conditions.
UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization: This paper proposes UniRain, a unified deraining framework that distills high-quality training samples from over 2 million public image pairs via RAG-driven dataset distillation, combines an asymmetric Mixture-of-Experts (MoE) architecture with a multi-objective adaptive reweighting optimization strategy, and for the first time handles all four degradation types — daytime rain streaks, daytime raindrops, nighttime rain streaks, and nighttime raindrops — within a single model.
UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization: UniRain is a unified deraining framework that employs RAG-driven dataset distillation to select high-quality samples from public datasets, and introduces a multi-objective reweighted optimization strategy within an asymmetric MoE architecture to balance learning across different rain degradation types, achieving state-of-the-art performance across four scenarios: daytime/nighttime rain streaks and raindrops.
Variational Garrote for Sparse Inverse Problems: Under a unified sparse inverse problem framework, this paper systematically compares \(\ell_1\) regularization (LASSO) with Variational Garrote (VG, a method that approximates \(\ell_0\) sparsity via variational binary gating) across three tasks—signal resampling, denoising, and sparse-view CT reconstruction—demonstrating that VG significantly reduces the minimum generalization error in severely underdetermined settings, with the greatest advantage observed at sampling rates below 20% or with very few projection angles.