UniRain: Unified Image Deraining with RAG-based Dataset Distillation and Multi-objective Reweighted Optimization¶

Conference: CVPR 2026 arXiv: 2603.03967 Code: https://github.com/QianfengY/UniRain Area: Image Restoration / Image Deraining Keywords: Unified Deraining, RAG Dataset Distillation, Multi-objective Optimization, Mixture of Experts, Day/Night

TL;DR¶

This paper proposes UniRain, a unified image deraining framework that employs RAG-driven dataset distillation to select high-quality samples from million-scale public datasets, combined with an asymmetric MoE architecture and a multi-objective reweighted optimization strategy, achieving consistently superior performance across four degradation types: rain streaks and raindrops under both daytime and nighttime conditions.

Background & Motivation¶

Background: Existing deraining methods are typically designed for specific degradation types (rain streaks, raindrops, nighttime rain, etc.) and suffer significant performance degradation on other types.
Limitations of Prior Work: Naively merging all publicly available datasets (>2 million pairs) introduces data quality imbalance — some datasets contain poor-quality backgrounds or unrealistic synthesis artifacts that interfere with model training. Training on heterogeneous degradation types under a unified objective leads to learning imbalance.
Key Challenge: Simply scaling up data volume does not guarantee better generalization. Different degradation types vary in difficulty, causing the model to overfit easier types (e.g., nighttime rain streaks) while neglecting harder ones (e.g., daytime raindrops) during joint training.
Goal: To build a high-quality deraining model capable of handling all four rain degradation types in a unified manner.
Key Insight: On the data side, RAG-based distillation is used to select reliable samples; on the model side, asymmetric MoE and multi-objective optimization are employed to balance across degradation types.
Core Idea: Data quality matters more than data quantity; different degradation types require a dynamically balanced optimization strategy.

Method¶

Overall Architecture¶

Two major modules: (1) a RAG-based dataset distillation pipeline that selects 52,869 high-quality training pairs (only 2.6%) from million-scale data; (2) a unified deraining model with asymmetric MoE and multi-objective reweighted optimization.

Key Designs¶

RAG Dataset Distillation Pipeline:
- Function: Selects high-quality training samples from large-scale public datasets.
- Mechanism: Retrieval stage: A real rainy image database is constructed (text generated by BLIP + visual features extracted by CLIP). For each candidate image, three-level similarity matching is performed: semantic similarity (L2 distance of CLIP text encoder) → visual similarity (cosine similarity of CLIP features) → structural similarity (SSIM). Generation stage: Retrieved real reference images and candidate images are fed into a VLM for quality assessment, with three VLMs voting on acceptance.
- Design Motivation: Using real rainy images as reference anchors for evaluating synthetic data quality is more reliable than no-reference assessment. Only 2.6% of the data is ultimately retained.
Multi-objective Reweighted Optimization Strategy:
- Function: Dynamically balances the learning rate across different degradation types.
- Mechanism: Three metrics work in concert: (1) TBS (Type Balance Score): down-weights fast-converging types and up-weights slow-converging types based on loss slope; (2) TSS (Type Stability Score): penalizes diverging types to prevent instability; (3) AF (Adaptive Factor): TBS dominates in early training (promoting balance) while TSS dominates in later stages (ensuring stability). The final weight is \(\omega_i(t) = \text{AF}(t) \cdot \text{TBS}(t) + (1-\text{AF}(t)) \cdot \text{TSS}(t)\).
- Design Motivation: Fixed weights cannot adapt to the dynamic changes in convergence speed across different degradation types throughout training.
Asymmetric MoE Architecture:
- Function: The encoder and decoder adopt different MoE strategies suited to their respective roles.
- Mechanism: The encoder uses Soft-MoE (continuous weighted combination of all experts) to retain diverse degradation cues; the decoder uses Hard-MoE (Top-k routing with selective activation) to enhance fine-grained texture reconstruction.
- Design Motivation: The encoder needs to broadly explore diverse degradation patterns, while the decoder needs to precisely reconstruct details — their distinct roles require different expert selection strategies.

Loss & Training¶

4 × RTX 4090, AdamW optimizer, 128×128 crop, batch size 8, 300K iterations.

Key Experimental Results¶

Main Results¶

Dataset / Type	Metric	UniRain	MSDT (Prev. SOTA)	Gain
RainRAG Average	PSNR	28.93	27.94	+0.99
RealRain-1k-H	PSNR	33.74	30.91	+2.83
RainDS-real-RD	PSNR	22.07	20.72	+1.35
WeatherBench	PSNR	34.25	33.56	+0.69

Ablation Study¶

Configuration	PSNR	SSIM	Note
VLM only (no RAG)	27.73	0.8358	No real reference
w/o generation stage	28.36	0.8425	No VLM quality evaluation
Full pipeline	28.93	0.8515	RAG distillation fully effective
Soft-MoE enc.+dec.	27.91	0.8465	Pure soft insufficient
Asymmetric MoE	28.93	0.8515	Optimal combination

Key Findings¶

Directly merging all data for training underperforms training on the distilled 2.6% subset.
The distilled dataset exhibits a broader and more diverse feature distribution.
Multi-objective optimization leads to more stable convergence of loss curves across all four degradation types.
The model generalizes to all-weather restoration (rain + snow + fog), achieving PSNR 26.01, surpassing TransWeather at 24.70.

Highlights & Insights¶

"Less is More" Data Philosophy: Using only 2.6% of the data surpasses training on the full dataset.
First Application of RAG in Low-level Vision: RAG is innovatively applied to dataset distillation rather than model inference.
Dynamic Optimization with Three Synergistic Metrics: The TBS + TSS + AF design simultaneously addresses type balance and training stability.

Limitations & Future Work¶

The RAG pipeline relies on VLM evaluation quality, and the VLMs themselves may introduce biases.
The number of experts and the Top-k value in the asymmetric MoE require manual tuning.
Model complexity (FLOPs: 126.5G), while lower than some methods, is still not lightweight.

vs. URIR: URIR is the first unified deraining network but is validated only on driving scenarios; UniRain is more general.
vs. NeRD-Rain: NeRD-Rain applies implicit neural representations to deraining but does not support unified multi-type training.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of RAG dataset distillation and multi-objective optimization is novel.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Multi-dataset, multi-scenario, comprehensive ablation, and weather extension.
Writing Quality: ⭐⭐⭐⭐ — Motivation figures are clear and ablation is systematic.
Value: ⭐⭐⭐⭐ — A practical unified deraining framework; the data distillation paradigm is broadly transferable.