🖼️ Image Restoration¶

🤖 AAAI2026 · 13 paper notes

Blur-Robust Detection via Feature Restoration: An End-to-End Framework for Prior-Guided Infrared UAV Target Detection: This paper proposes JFD3, an end-to-end dual-branch framework that performs deblurring in the feature domain rather than the image domain, and leverages frequency structure priors to guide the detection network, achieving high-accuracy real-time infrared UAV target detection under motion blur conditions.
Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration: This paper is the first to define and explore the multi-weather nighttime image restoration task. It constructs the AllWeatherNight dataset (8K training + 1K synthetic test + 1K real-world test) and proposes the ClearNight unified framework, which simultaneously removes compound degradations—haze, rain streaks, raindrops, snow, and flare—in a single stage via Retinex dual-prior guidance and weather-aware dynamic specificity–commonality collaboration. With only 2.84M parameters, ClearNight comprehensively surpasses state-of-the-art methods.
ClearAIR: A Human-Visual-Perception-Inspired All-in-One Image Restoration: Inspired by human visual perception (HVP), this paper proposes ClearAIR, a coarse-to-fine unified image restoration framework that progressively recovers image quality through four stages — MLLM-based quality assessment → semantic region perception → degradation type identification → internal clue reuse — achieving state-of-the-art performance across multiple degradation tasks.
Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models: This paper proposes the LLMHNI framework, which leverages two types of auxiliary signals generated by LLMs—semantic relevance and logical relevance—to resolve the confusion between hard samples and noisy samples in recommender systems, significantly improving denoising recommendation performance.
HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios: This paper proposes HQ-SVC, a framework that leverages a disentangled audio codec (FACodec) to jointly extract content and speaker features, integrates an Enhanced Voice Adaptor (EVA) to fuse acoustic features such as pitch and energy, and employs a progressive synthesis pipeline combining DDSP and a diffusion model. Trained on a single RTX 3090 with fewer than 80 hours of singing data, HQ-SVC achieves zero-shot singing voice conversion quality surpassing large-scale training baselines, and additionally supports speech super-resolution.
ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement: Targeting two overlooked statistical distribution issues in the HVI color space — large distribution discrepancy between chrominance and luminance branches leading to insufficient complementary feature extraction, and weak inter-chrominance correlation causing gradient conflicts — this paper proposes the ICLR framework. It introduces a Dual-stream Interaction Enhancement Module (DIEM) and a Covariance Correction Loss (CCL) to address these issues from the perspectives of fusion enhancement and statistical distribution optimization, respectively, achieving state-of-the-art performance on the LOL benchmark series.
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework: This paper investigates the effective utilization of decoder-based LLMs for Extreme Multi-label Classification (XMC), proposing a dual-decoder learning strategy and the ViXML multimodal framework. By employing structured prompt templates to adapt LLM embeddings and efficiently integrating visual metadata, the method substantially outperforms state-of-the-art approaches on four public benchmarks (up to +8.21% P@1 on the largest dataset), demonstrating that "one image outweighs billions of parameters."
MFmamba: A Multi-function Network for Panchromatic Image Resolution Restoration Based on State-Space Model: This paper proposes MFmamba, a multi-function network built upon a UNet++ backbone that integrates a Mamba Upsampling Block (MUB), Dual Pooling Attention (DPA), and a Multi-scale Hybrid Cross Block (MHCB). Using only panchromatic (PAN) images as input, the unified framework simultaneously supports three tasks: super-resolution, spectral restoration, and joint SR with colorization.
RefiDiff: Progressive Refinement Diffusion for Efficient Missing Data Imputation: RefiDiff proposes a four-stage framework (pre-processing → warm-up → diffusion → polishing) that progressively unifies the predictive and generative imputation paradigms for the first time. Combined with a Mamba-based denoising network, it achieves state-of-the-art performance across 9 datasets while running 4× faster than DIFFPUTER.
SD-PSFNet: Sequential and Dynamic Point Spread Function Network for Image Deraining: SD-PSFNet is a cascaded CNN-based deraining network driven by a dynamic PSF mechanism. It models the optical effects of raindrops via a multi-scale learnable PSF dictionary, combined with a sequential restoration architecture featuring adaptive gated fusion. The method achieves SOTA performance of 33.12 dB on Rain100H and 42.28 dB on RealRain-1k-L, yielding a cumulative gain of 5.04 dB (13.5%) over the baseline MPRNet.
SpatioTemporal Difference Network for Video Depth Super-Resolution: Motivated by the statistical observation that spatially non-smooth regions and temporally varying regions in video depth super-resolution (VDSR) follow long-tail distributions, this paper proposes STDNet. The method incorporates a spatial difference branch (learning spatial difference representations for intra-frame RGB-D adaptive aggregation) and a temporal difference branch (exploiting temporal difference representations for motion compensation in changing regions). On the TarTanAir dataset at ×16 super-resolution, RMSE is reduced from 112.04 cm to 96.80 cm, outperforming state-of-the-art methods by an average of 27.6%–32.6%.
Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment: This paper proposes TIG-SVQA, a framework that, for the first time, incorporates temporal inconsistency as an explicit guidance signal for super-resolution video quality assessment. The framework introduces an Inconsistency-Highlighted Spatial Module (IHSM) and an Inconsistency-Guided Temporal Module (IGTM), achieving SRCC scores of 0.950, 0.942, and 0.939 on the SFD, MFD, and Combined-VSR datasets, respectively, surpassing all existing IQA/VQA methods.
TMDC: A Two-Stage Modality Denoising and Complementation Framework for Multimodal Sentiment Analysis: This paper proposes TMDC, a two-stage framework in which the first stage learns denoised modality-specific and modality-common representations on complete data, and the second stage leverages denoised representations from available modalities to reconstruct missing ones — marking the first joint treatment of noise and missing modalities in MSA.