GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration¶
Conference: CVPR 2025
arXiv: 2411.17687
Code: https://sudraj2002.github.io/gendegpage/ (Project Page)
Area: Image Generation
Keywords: Image degradation synthesis, diffusion models, All-In-One restoration, out-of-domain generalization, synthetic data
TL;DR¶
This paper proposes GenDeg, a degradation synthesis framework based on Stable Diffusion, which can generate various controllable degradations (haze/rain/snow/motion blur/low-light/raindrops) on arbitrary clean images. By synthesizing over 550k images to construct the GenDS dataset, training All-In-One restoration models on it achieves significant performance improvements on out-of-domain test sets.
Background & Motivation¶
- Background: All-In-One Image Restoration (AIOR) handles multiple degradations with a single model. Representative methods include PromptIR, DA-CLIP, and Diff-Plugin.
- Limitations of Prior Work: Existing AIOR models generalize poorly to out-of-distribution (OoD) degradation patterns and scenes. ① Existing datasets are small (far smaller than the 1.5M+ of SAM/Depth-Anything) and lack scene diversity; ② Synthetic datasets employ simple physical formulations (e.g., RESIDE uses only the atmospheric scattering model), resulting in monotonic degradation patterns; ③ Real-world degraded data (e.g., haze, low-light, raindrops) is difficult to collect, leading to scarce samples.
- Key Challenge: AIOR methods overfit the training distribution, primarily because of the lack of degradation diversity and scene diversity in training data, whereas real-world data collection is impractical.
- Goal: Use generative models to synthesize large-scale, diverse degradation data to improve the OoD generalization ability of AIOR.
- Key Insight: Latent Diffusion Models have powerful generative priors and conditional control capabilities, which can generate realistic degradation patterns while preserving scene semantics.
- Core Idea: Synthesize large-scale and diverse degradation data using a degradation severity-aware conditional diffusion model to resolve the generalization bottleneck of All-In-One restoration from the data perspective.
Method¶
Overall Architecture¶
GenDeg is based on the InstructPix2Pix architecture, taking a clean image \(c_{img}\) + text prompt \(c_{text}\) + degradation severity conditions as input, and outputting a degraded image. During training, paired degradation-clean data from multiple existing datasets are used. During inference, degradation is generated on new clean images. The generated images are corrected by a Structure Correction Module (SCM) and finally merged with the original datasets to form the GenDS dataset (750k+ samples), which is used to train restoration models.
Key Designs¶
-
Degradation Severity-Aware Conditional Diffusion Model:
- Function: To finely control the degradation intensity and spatial distribution during degradation generation.
- Mechanism: Define a degradation map \(c_{map} = |x_{in} - c_{img}|\), and calculate its mean \(\mu\) and standard deviation \(\sigma\) to quantify degradation severity. Encode \(\mu\) and \(\sigma\) respectively into 129-dimensional one-hot vectors (128 bins + 1 null), concatenate them, project to \(\mathbb{R}^{77 \times 2}\), concatenate with the CLIP text embeddings \(e_{text} \in \mathbb{R}^{77 \times 768}\), and project back to 768 dimensions to serve as conditional inputs for Stable Diffusion.
- Design Motivation: Using only text prompts (such as "hazy") can cause the diffusion model to generate extreme degradations (overly dense haze or heavy rain). The \(\mu\)-\(\sigma\) conditions enable the model to perceive the target degradation intensity, producing more realistic and controllable degradations.
-
Structure Correction Module (SCM):
- Function: To restore details lost during the VAE encoding-decoding process.
- Mechanism: SCM is a lightweight network \(S\) that takes the concatenation of the generated image and clean image as input and outputs a residual: \(x_S = x_{gen} + S([x_{gen}, c_{img}])\). During training, a one-step reverse diffusion is used to obtain the generated image, and the loss function is a timestep-weighted L2 loss \(L_S = \sqrt{\bar{\alpha}_{t-1}} \cdot \sqrt{1-\bar{\alpha}_t} \cdot \|x_{in} - x_S\|_2^2\), where the weight is lower at initial and final timesteps.
- Design Motivation: LDM's VAE encoding and decoding lose details. SCM is effective only for smooth degradations such as haze, raindrops, and motion blur. For rain, snow, and low-light, clean images reconstructed via VAE (\(\hat{c}_{img}\)) are used instead.
-
Data Generation and Quality Control:
- Function: Scale up the generation of high-quality paired degradation data.
- Mechanism: Starting with approximately 120k clean images from the training dataset, 5 types of degradation not present in their original datasets are generated for each image. \(\mu_{gen}\) is sampled from the target dataset's histogram, and \(\sigma_{gen}\) is sampled from the histogram within the corresponding \(\mu\) bin (to ensure statistical correlation). One out of twenty images uses a random \(\sigma\) to increase diversity. After generation, poorly structured images are filtered based on the mean of the degradation map.
- Design Motivation: Sampling from the dataset's histogram ensures the authentic distribution of degradation severity. Cross-degradation (generating degradation B on images with degradation category A) increases the combinational diversity of scene-degradations.
Loss & Training¶
- GenDeg training utilizes the standard LDM denoising objective (Equation 1).
- A Swin Transformer restoration network is proposed: featuring an ImageNet-pretrained Swin encoder + lightweight convolutional decoder, hierarchical feature aggregation, and 3x3 convolutions to avoid patch boundary artifacts.
- Five restoration models are trained simultaneously: NAFNet, PromptIR, Swin, DA-CLIP, and Diff-Plugin.
Key Experimental Results¶
Main Results (OoD Performance, LPIPS/FID, lower is better)¶
| Method | REVIDE (Haze) | O-Haze (Haze) | GoPro (Blur) | LOLv1 (Low-light) | RainDS (Raindrop) |
|---|---|---|---|---|---|
| PromptIR | 0.262/62.0 | 0.333/150.9 | 0.186/32.9 | 0.258/111.8 | 0.208/106.8 |
| PromptIR+GenDS | 0.212/56.0 | 0.160/89.0 | 0.191/31.9 | 0.178/87.9 | 0.182/79.8 |
| NAFNet | 0.211/71.3 | 0.183/99.2 | 0.155/28.2 | 0.167/78.8 | 0.178/73.4 |
| NAFNet+GenDS | 0.151/52.5 | 0.143/76.7 | 0.149/28.7 | 0.147/63.7 | 0.170/60.5 |
Ablation Study¶
| Configuration | Description |
|---|---|
| Existing data only | Baseline performance, poor OoD generalization |
| +GenDS data | Significant improvements across all five models on OoD |
| NAFNet shows maximum gain | LPIPS on REVIDE haze improved from 0.211 to 0.151 (-28.4%) |
| In-domain performance mostly maintained | No obvious degradation in in-domain performance after adding GenDS |
Key Findings¶
- All five restoration models (both non-generative and generative) show significant improvements in OoD performance after incorporating GenDS, demonstrating the universal value of the synthetic data.
- The dehazing task achieves the most notable boost (O-Haze PromptIR FID drops from 150.9 to 89.0) because real haze datasets are heavily scarce.
- t-SNE visualizations show that the degradation feature distribution generated by GenDS effectively bridges the domain gap between existing training datasets and OoD test data.
- GenDS is the first dataset providing multiple degraded versions for the same clean image, naturally tailored for AIOR training.
- Controlling with \(\mu\)-\(\sigma\) conditions is critical for realistic degradation; unconditional generation tends to yield extreme degradations.
Highlights & Insights¶
- "Solving generalization from the data side" is a highly valuable paradigm: without changing the model architecture, but merely modifying the training data, all five different models achieved performance gains, proving that data quality outweighs model complexity.
- \(\mu\)-\(\sigma\) conditioning of degradation severity is a key innovation: encoding degradation severity into conditional signals fused with CLIP embeddings balances both global intensity and spatial distribution control.
- Cross-degradation dataset training allows GenDeg to break free from relying on a single physical model, conceptually learning the capability to generate diverse degradation patterns.
- This methodology can be directly transferred to other low-level vision tasks requiring large-scale paired training data, such as super-resolution and denoising.
Limitations & Future Work¶
- Currently, only 6 degradation types are covered, leaving out common degradations such as JPEG compression, noise, and overexposure.
- SCM is not applicable to rain, snow, and low-light degradations, requiring specialized handling, which lacks unified consistency.
- The quality of generated images still depends on the reconstruction quality of the VAE, meaning high-frequency detail loss cannot be completely avoided.
- Despite being larger than existing datasets, the size of GenDS (750k) is still far from the SAM scale (11 million); further scaling may yield even larger improvements.
- Composite degradation effects (such as haze + rain) are not yet considered.
Related Work & Insights¶
- vs InstructPix2Pix: GenDeg extends it by incorporating degradation severity conditions (\(\mu\)/\(\sigma\)), steering image editing toward degradation synthesis.
- vs DA-CLIP: DA-CLIP guides restoration using degradation features from CLIP, whereas GenDeg enhances the data side. The two are complementary—applying DA-CLIP+GenDS also yielded significant gains in experiments.
- vs PromptIR: PromptIR utilizes learnable prompts to identify degradation types. The GenDS dataset directly boosts PromptIR's OoD performance.
- This paradigm of "using generative models to synthesize training data" is similar to data augmentation concepts in classification, but with stricter requirements for preserving fine details.
Rating¶
- Novelty: ⭐⭐⭐⭐ The first systematic work using diffusion models to synthesize degradation data to enhance the generalization of restoration models, featuring an innovative \(\mu\)-\(\sigma\) conditioning mechanism.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Extremely comprehensive, involving five different models, six degradations, multiple OoD/in-domain datasets, and t-SNE visualizations.
- Writing Quality: ⭐⭐⭐⭐ Highly logical with detailed data analysis and rich visualizations.
- Value: ⭐⭐⭐⭐⭐ Provides a ready-to-use 750k-sample dataset and degradation synthesis tools, directly pushing the community forward.