UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization¶
Conference: CVPR 2026 arXiv: 2603.03967 Code: N/A Area: Image Restoration Keywords: image_restoration, deraining, mixture_of_experts, multi_objective_optimization, RAG
TL;DR¶
This paper proposes UniRain, a unified deraining framework that distills high-quality training samples from over 2 million public image pairs via RAG-driven dataset distillation, combines an asymmetric Mixture-of-Experts (MoE) architecture with a multi-objective adaptive reweighting optimization strategy, and for the first time handles all four degradation types — daytime rain streaks, daytime raindrops, nighttime rain streaks, and nighttime raindrops — within a single model.
Background & Motivation¶
Existing deraining methods face two core challenges:
- Uneven data quality: Directly mixing all synthetic and real datasets (>2 million pairs) introduces inaccurate supervision signals, which degrades model convergence and generalization. Experiments confirm that naive mixing can even underperform a carefully curated subset.
- Training imbalance: Different rain degradation types — daytime rain streaks (DRS), daytime raindrops (DRD), nighttime rain streaks (NRS), and nighttime raindrops (NRD) — vary substantially in difficulty and convergence rate, causing unified optimization to favor simpler types while neglecting harder ones.
Method¶
1. RAG-based Dataset Distillation¶
Retrieval Stage¶
A database is constructed from a large-scale corpus, storing a triplet \((T_r, f_r, I_r)\) for each real rain image (BLIP text description, CLIP visual feature, and image).
A three-level hierarchical similarity matching is applied to each query image: - Semantic similarity: \(s_{txt}(q,r) = \|\phi_T(T_q) - \phi_T(T_r)\|_2\), selecting Top-\(K_1\) - Visual feature similarity: \(s_{vis}(q,r') = \frac{f_q^\top f_{r'}}{\|f_q\|_2 \|f_{r'}\|_2}\), selecting Top-\(K_2\) - Structural similarity: \(s_{perc}(q,r'') = SSIM(I_q, I_{r''})\), selecting Top-\(K_3\)
Generation Stage¶
Retrieved reference images are combined with the query image, and three VLMs (InternVL2.5-8B, LLaVA-NeXT-7B, MobileVLM-3B) vote to assess data quality:
This process distills 52,869 high-quality training pairs from over 2 million image pairs, retaining approximately 2.6% of the original data.
2. Multi-objective Adaptive Reweighted Optimization¶
The convergence slope \(\alpha_i\) for each degradation type is estimated via sliding-window linear regression, from which three dynamic weight metrics are derived:
-
Type Balance Score (TBS): Shifts attention toward types with slower convergence. $\(\mathrm{TBS}_i(t) = \text{softmax}_i\left(K \frac{\alpha_i(t)}{\sum_i |\alpha_i(t)|}\right)\)$
-
Type Stability Score (TSS): Suppresses excessive weights for diverging types. $\(\mathrm{TSS}_i(t) = \text{softmax}_i\left(-N \frac{\alpha_i(t)}{\sum_{k=t-N+1}^t |\alpha_i(k)|}\right)\)$
-
Adaptivity Factor (AF): Dynamically adjusts the balance between TBS and TSS. $\(AF(t) = \min\left(t \cdot \text{softmax}_t\left(-\frac{\tau t \cdot \alpha_{\max}(t)}{\sum_{i=1}^t \alpha_{\max}(i)}\right), 1\right)\)$
Final weight: \(\omega_i(t) = AF(t) \cdot TBS(t) + (1 - AF(t)) \cdot TSS(t)\)
3. Asymmetric MoE Architecture¶
-
Encoder (Soft-MoE): Outputs from all experts are aggregated via continuous weighting to comprehensively preserve diverse degradation cues. $\(y_{en} = \sum_{i=1}^N \mathcal{R}_{soft}^i \otimes y_{en}^i\)$
-
Decoder (Hard-MoE): Top-k routing selectively activates the most relevant experts to focus on fine-grained texture reconstruction. $\(y_{de} = \sum_{i=1}^N \mathcal{R}_{hard}^i \cdot y_{de}^i\)$
Key Experimental Results¶
Table 1: Unified Evaluation on RainRAG Dataset Across Four Degradation Types¶
| Method | DRS PSNR | DRD PSNR | NRS PSNR | NRD PSNR | Avg. PSNR↑ | Avg. SSIM↑ |
|---|---|---|---|---|---|---|
| Restormer | 28.45 | 23.36 | 33.92 | 25.85 | 27.89 | 0.8405 |
| MSDT | 28.60 | 23.31 | 34.56 | 25.28 | 27.94 | 0.8410 |
| NeRD-Rain | 28.11 | 23.30 | 33.88 | 25.31 | 27.65 | 0.8340 |
| URIR | 28.29 | 23.19 | 34.32 | 25.82 | 27.91 | 0.8425 |
| UniRain | 29.58 | 24.71 | 35.23 | 26.21 | 28.93 | 0.8515 |
Table 2: Average Performance on Real-world Public Benchmarks¶
| Method | Avg. PSNR↑ | Avg. SSIM↑ |
|---|---|---|
| NeRD-Rain | 27.81 | 0.8132 |
| URIR | 27.69 | 0.8061 |
| UniRain | 29.42 | 0.8222 |
UniRain achieves consistent and significant improvements across all four degradation types and all real-world benchmarks, surpassing the previous state of the art by approximately 1 dB in average PSNR.
Highlights & Insights¶
- The RAG + VLM dataset distillation paradigm is novel: it transfers retrieval-augmented generation from NLP to low-level visual data curation, achieving better performance by retaining only 2.6% of the original data.
- The multi-objective adaptive reweighting strategy effectively addresses type imbalance in mixed training; the three-level TBS/TSS/AF design is internally coherent.
- The asymmetric MoE design — soft routing in the encoder and hard routing in the decoder — is intuitive, reflecting an exploration-versus-exploitation principle.
- UniRain is the first unified deraining framework to simultaneously cover daytime/nighttime rain streaks and raindrops.
- Model complexity is comparable to competing methods (126.5G FLOPs, 24.4M parameters).
Limitations & Future Work¶
- The RAG data distillation pipeline requires inference across multiple VLMs, incurring high upfront computational cost.
- The accuracy of VLM-based quality assessment depends on prompt engineering and VLM capability, which may introduce bias.
- The window size \(N\) and sensitivity parameter \(\tau\) in the multi-objective optimization require manual tuning.
- The framework addresses only rain-related degradations and has not been extended to other adverse weather conditions such as haze or snow.
- Performance gains on the nighttime raindrop (NRD) subset are relatively modest (+0.39 dB), indicating that complex degradations remain challenging.
Rating¶
⭐⭐⭐⭐ — The problem formulation is practically motivated, and the combination of RAG-based dataset distillation and multi-objective optimization is logically coherent and effective. The unified framework offers strong practical utility; however, the scalability of the RAG pipeline and its generalizability to other degradation types remain to be validated.