SD-PSFNet: Sequential and Dynamic Point Spread Function Network for Image Deraining¶

Conference: AAAI 2026 arXiv: 2511.17993 Code: https://github.com/Aster-1024/SD-PSFNet Area: Image Deraining / Physics-Aware Image Restoration Keywords: image deraining, point spread function, physics-aware, multi-stage restoration, dynamic filtering

TL;DR¶

SD-PSFNet is a cascaded CNN-based deraining network driven by a dynamic PSF mechanism. It models the optical effects of raindrops via a multi-scale learnable PSF dictionary, combined with a sequential restoration architecture featuring adaptive gated fusion. The method achieves SOTA performance of 33.12 dB on Rain100H and 42.28 dB on RealRain-1k-L, yielding a cumulative gain of 5.04 dB (13.5%) over the baseline MPRNet.

Background & Motivation¶

Image deraining is a fundamental low-level vision task critical to downstream applications such as object detection, autonomous driving, and surveillance systems. Existing methods face three core challenges:

Lack of Physical Modeling: Most deep learning methods learn input-output mappings purely from data, neglecting the optical physical properties of raindrops (orientation, density, refraction, etc.), which prevents dynamic adaptation to varying rain types, results in incomplete deraining, and lacks interpretability.

Limitations of Conventional PSF: Explicit physical modeling relies on fixed prior assumptions (e.g., static PSF templates), which cannot capture the multi-scale distribution and optical variability of rain degradation.

Efficiency Bottleneck: Although Transformer/GAN/Diffusion-based methods deliver strong performance, they impose large parameter counts (Restormer 26.10M, HINet 88.67M), making real-time deployment prohibitive.

Core insight: Transform PSF from a static preset into a data-driven learnable form, approximating spatially varying degradation functions with a finite-dimensional dictionary of degradation modes.

Method¶

Overall Architecture¶

SD-PSFNet adopts a three-stage sequential restoration architecture (Stage In → \(\tau\) Stage Mid → ORStage), inspired by the sequential modeling concept of LSTM and the multi-stage design of MPRNet. Unlike MPRNet's spatial splitting strategy, each stage in SD-PSFNet processes the full image and dynamically predicts PSF features within a UNet-style multi-scale downsampling structure. Across stages, physical priors and feature information are propagated via adaptive gated fusion.

Key Designs¶

Dynamic PSF Mechanism: The spatially varying degradation function is modeled as a linear combination of a learnable degradation mode dictionary: \(K(x,y) \approx \sum_{j=1}^{K_c} w_j(x,y) \cdot k_j\), where \(k_j\) are learnable 2D convolutional kernels representing basic degradation modes, and \(w_j\) are adaptively mapped from image features by a neural network.
Multi-Scale PSF Head: Three prediction heads with kernels of size 3×3, 5×5, and 7×7 capture high-frequency detail degradation, balanced local-global information, and macro low-frequency patterns, respectively. After adaptive fusion via a Channel Attention Block (CAB), a 1×1 convolution projects the output into a \(K_c=40\)-channel PSF representation, with spatial normalization applied per channel to ensure energy conservation.
PSF-Aware Attention: A dual-path mechanism — (a) Channel Modulation: a PSF encoder extracts compact features to generate \(\gamma, \beta\) parameters, with \(x_{mod} = x \odot \gamma + \beta\); (b) Spatial Attention: a single-channel PSF is upsampled to feature resolution and combined with the modulated features to produce spatial weights that highlight regions of varying degradation severity.
Gated Cross-Stage Fusion: \(F^{(t)} = G_\theta \cdot F_{current} + (1-G_\theta) \cdot F_{prev}\), applied at three locations — adaptive fusion of shallow features at stage input, hierarchical encoder feature updates, and an enhanced CSFF dual-gate unit for precise control of cross-stage information flow.

Loss & Training¶

\[\mathcal{L}_{total} = \sum_{s=1}^{\tau+1} (\mathcal{L}_{char}(I_s, T) + 0.05 \cdot \mathcal{L}_{edge}(I_s, T) + 0.01 \cdot \mathcal{L}_{freq}(I_s, T))\]

The Charbonnier loss provides pixel-level supervision, the edge-aware loss preserves high-frequency details, and the frequency-domain loss aligns PSF degradation characteristics. Training uses the AdamW optimizer with lr=1e-4, 3-epoch linear warmup followed by cosine annealing to 1e-6, FP16 mixed precision, and gradient clipping at 2.0. Patch size is 128×128, training runs for 2000 epochs on a single RTX 4090.

Key Experimental Results¶

Main Results: Comparison with SOTA Methods¶

Method	Type	Params (M)	Rain100L PSNR	Rain100H PSNR	RealRain-1k-L PSNR	RealRain-1k-H PSNR
MPRNet	CNN	3.64	36.40	30.41	36.29	34.74
HINet	CNN	88.67	37.28	30.65	41.98	40.82
M3SNet	CNN	16.70	40.04	30.64	41.55	40.01
Restormer	Transformer	26.10	38.99	31.46	40.90	39.57
DRSFormer	Transformer	33.66	41.32	32.07	41.52	40.21
NeRD-Rain-S	Transformer	10.53	42.00	32.86	38.64	36.69
SD-PSFNet	CNN	9.63	41.47	33.12	42.28	41.08

Ablation Study: Incremental Component Contributions (RealRain-1k-L)¶

Model Configuration	PSNR (dB)	SSIM	Gain
MPRNet (Baseline)	37.24	0.9754	—
+ Gate Mechanism	38.65	0.9773	+1.41
+ Hierarchical Update	40.41	0.9792	+1.76
+ Enhanced CSFF	41.41	0.9850	+1.00
+ 1-channel PSF	41.63	0.9855	+0.22
+ 40-channel PSF (Ours)	42.28	0.9872	+0.65
Cumulative Gain			+5.04

Key Findings¶

Effect of Stage Count \(\tau\): Performance increases monotonically from \(\tau=0\) (40.81 dB, 3.64M) to \(\tau=3\) (41.54 dB, 9.63M); the CNN architecture manages parameter growth efficiently.
Cross-Domain Generalization: Training on Rain100H and testing on RealRain-1k-L yields 26.98 dB, surpassing Restormer (26.59 dB) and NeRD-Rain-S (26.67 dB) trained on Rain13K.
Synthetic-to-Real Domain Gap: Models trained on synthetic data exhibit severe performance degradation on real-world test sets (e.g., training on Rain100L → only 27.54 dB on RealRain-1k-L), reflecting that data-driven methods tend to learn dataset-specific features rather than generalizable deraining principles.

Highlights & Insights¶

CNN + Physical Priors Can Match Transformers: A 9.63M-parameter CNN surpasses HINet (88.67M) and Restormer (26.10M) on real-world datasets, demonstrating the value of physics-informed modeling.
Elegant Design of the Dynamic PSF Dictionary: The combination of a finite-dimensional dictionary with data-driven weight assignment achieves both physical interpretability (each basis kernel corresponds to a specific degradation mode) and data adaptability.
Information Flow Control in Sequential Restoration: The dual-gate CSFF mechanism precisely governs the flow of shallow-level details and deep-level semantics during cross-stage feature propagation.

Limitations & Future Work¶

The synthetic-to-real generalization gap remains notable; the linear degradation assumption of PSF may be insufficient for non-linear degradations (e.g., combined rain and haze).
Temporal consistency in video deraining scenarios has not been validated.
At \(\tau=3\), MACs reach 244.83G, which may not offer an efficiency advantage over lightweight Transformers such as DRSFormer.

Method Category	Representative Methods	Characteristics	Limitations
Early CNN	DerainNet, DDN	Local feature extraction	Limited receptive field
Multi-Scale CNN	RESCAN, DID-MDN	Multi-resolution feature fusion	Lacks physical modeling
Transformer	Restormer, DRSFormer	Long-range dependency modeling	Large parameter count, difficult to deploy
GAN/Diffusion	SSCGAN, DCDGAN	Generates perceptually realistic results	Mode collapse, unstable training
SD-PSFNet	Ours	Physical PSF + Sequential Restoration	The only CNN achieving SOTA on both synthetic and real datasets simultaneously

Rating¶

Novelty: ⭐⭐⭐⭐ The dynamic PSF mechanism organically integrates physical modeling with data-driven learning.
Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive ablations, cross-domain evaluation, and feature visualization.
Writing Quality: ⭐⭐⭐⭐ The correspondence between the physical model and network design is clearly articulated.
Value: ⭐⭐⭐⭐ Demonstrates that CNN + physical priors can be competitive with Transformers.