UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization¶

Conference: CVPR 2026 arXiv: 2603.03967 Code: N/A Area: Image Restoration Keywords: image_restoration, deraining, mixture_of_experts, multi_objective_optimization, RAG

TL;DR¶

This paper proposes UniRain, a unified deraining framework that distills high-quality training samples from over 2 million public image pairs via RAG-driven dataset distillation, combines an asymmetric Mixture-of-Experts (MoE) architecture with a multi-objective adaptive reweighting optimization strategy, and for the first time handles all four degradation types — daytime rain streaks, daytime raindrops, nighttime rain streaks, and nighttime raindrops — within a single model.

Background & Motivation¶

Existing deraining methods face two core challenges:

Uneven data quality: Directly mixing all synthetic and real datasets (>2 million pairs) introduces inaccurate supervision signals, which degrades model convergence and generalization. Experiments confirm that naive mixing can even underperform a carefully curated subset.
Training imbalance: Different rain degradation types — daytime rain streaks (DRS), daytime raindrops (DRD), nighttime rain streaks (NRS), and nighttime raindrops (NRD) — vary substantially in difficulty and convergence rate, causing unified optimization to favor simpler types while neglecting harder ones.

Method¶

1. RAG-based Dataset Distillation¶

Retrieval Stage¶

A database is constructed from a large-scale corpus, storing a triplet $(T_r, f_r, I_r)$ for each real rain image (BLIP text description, CLIP visual feature, and image).

A three-level hierarchical similarity matching is applied to each query image: - Semantic similarity: $s_{txt}(q,r) = \|\phi_T(T_q) - \phi_T(T_r)\|_2$, selecting Top-$K_1$ - Visual feature similarity: $s_{vis}(q,r') = \frac{f_q^\top f_{r'}}{\|f_q\|_2 \|f_{r'}\|_2}$, selecting Top-$K_2$ - Structural similarity: $s_{perc}(q,r'') = SSIM(I_q, I_{r''})$, selecting Top-$K_3$

Generation Stage¶

Retrieved reference images are combined with the query image, and three VLMs (InternVL2.5-8B, LLaVA-NeXT-7B, MobileVLM-3B) vote to assess data quality:

\[\hat{R}_q = \begin{cases} 1 & \text{if } \sum_{i=1}^3 \mathbb{I}(R_q^i = 1) \geq 2 \\ 0 & \text{otherwise} \end{cases}\]

This process distills 52,869 high-quality training pairs from over 2 million image pairs, retaining approximately 2.6% of the original data.

2. Multi-objective Adaptive Reweighted Optimization¶

The convergence slope $\alpha_i$ for each degradation type is estimated via sliding-window linear regression, from which three dynamic weight metrics are derived:

Type Balance Score (TBS): Shifts attention toward types with slower convergence. $$\mathrm{TBS}_i(t) = \text{softmax}_i\left(K \frac{\alpha_i(t)}{\sum_i |\alpha_i(t)|}\right)$$
Type Stability Score (TSS): Suppresses excessive weights for diverging types. $$\mathrm{TSS}_i(t) = \text{softmax}_i\left(-N \frac{\alpha_i(t)}{\sum_{k=t-N+1}^t |\alpha_i(k)|}\right)$$
Adaptivity Factor (AF): Dynamically adjusts the balance between TBS and TSS. $$AF(t) = \min\left(t \cdot \text{softmax}_t\left(-\frac{\tau t \cdot \alpha_{\max}(t)}{\sum_{i=1}^t \alpha_{\max}(i)}\right), 1\right)$$

Final weight: $\omega_i(t) = AF(t) \cdot TBS(t) + (1 - AF(t)) \cdot TSS(t)$

3. Asymmetric MoE Architecture¶

Encoder (Soft-MoE): Outputs from all experts are aggregated via continuous weighting to comprehensively preserve diverse degradation cues. $$y_{en} = \sum_{i=1}^N \mathcal{R}_{soft}^i \otimes y_{en}^i$$
Decoder (Hard-MoE): Top-k routing selectively activates the most relevant experts to focus on fine-grained texture reconstruction. $$y_{de} = \sum_{i=1}^N \mathcal{R}_{hard}^i \cdot y_{de}^i$$

Key Experimental Results¶

Table 1: Unified Evaluation on RainRAG Dataset Across Four Degradation Types¶

Method	DRS PSNR	DRD PSNR	NRS PSNR	NRD PSNR	Avg. PSNR↑	Avg. SSIM↑
Restormer	28.45	23.36	33.92	25.85	27.89	0.8405
MSDT	28.60	23.31	34.56	25.28	27.94	0.8410
NeRD-Rain	28.11	23.30	33.88	25.31	27.65	0.8340
URIR	28.29	23.19	34.32	25.82	27.91	0.8425
UniRain	29.58	24.71	35.23	26.21	28.93	0.8515

Table 2: Average Performance on Real-world Public Benchmarks¶

Method	Avg. PSNR↑	Avg. SSIM↑
NeRD-Rain	27.81	0.8132
URIR	27.69	0.8061
UniRain	29.42	0.8222

UniRain achieves consistent and significant improvements across all four degradation types and all real-world benchmarks, surpassing the previous state of the art by approximately 1 dB in average PSNR.

Highlights & Insights¶

The RAG + VLM dataset distillation paradigm is novel: it transfers retrieval-augmented generation from NLP to low-level visual data curation, achieving better performance by retaining only 2.6% of the original data.
The multi-objective adaptive reweighting strategy effectively addresses type imbalance in mixed training; the three-level TBS/TSS/AF design is internally coherent.
The asymmetric MoE design — soft routing in the encoder and hard routing in the decoder — is intuitive, reflecting an exploration-versus-exploitation principle.
UniRain is the first unified deraining framework to simultaneously cover daytime/nighttime rain streaks and raindrops.
Model complexity is comparable to competing methods (126.5G FLOPs, 24.4M parameters).

Limitations & Future Work¶

The RAG data distillation pipeline requires inference across multiple VLMs, incurring high upfront computational cost.
The accuracy of VLM-based quality assessment depends on prompt engineering and VLM capability, which may introduce bias.
The window size $N$ and sensitivity parameter $\tau$ in the multi-objective optimization require manual tuning.
The framework addresses only rain-related degradations and has not been extended to other adverse weather conditions such as haze or snow.
Performance gains on the nighttime raindrop (NRD) subset are relatively modest (+0.39 dB), indicating that complex degradations remain challenging.

Rating¶

⭐⭐⭐⭐ — The problem formulation is practically motivated, and the combination of RAG-based dataset distillation and multi-objective optimization is logically coherent and effective. The unified framework offers strong practical utility; however, the scalability of the RAG pipeline and its generalizability to other degradation types remain to be validated.