Bridging Degradation Discrimination and Generation for Universal Image Restoration¶

Conference: ICLR 2026 arXiv: 2602.00579 Code: N/A Area: Image Generation Keywords: universal image restoration, GLCM degradation representation, diffusion model, three-stage training, all-in-one restoration

TL;DR¶

BDG performs fine-grained degradation discrimination via multi-angle multi-scale gray-level co-occurrence matrices (MAS-GLCM), and designs a three-stage diffusion training pipeline (generation → bridging → restoration) to seamlessly integrate degradation discrimination with generative priors, achieving significant fidelity improvements on all-in-one restoration and real-world super-resolution tasks.

Background & Motivation¶

Background: Universal image restoration requires a single model to handle multiple degradation types, necessitating both degradation discrimination and conditional generation capabilities.

Limitations of Prior Work: - Degradation discrimination approaches (AirNet, PromptIR, etc.): introduce auxiliary discriminative networks to identify degradation types, but L1/L2 losses produce overly smooth outputs with poor performance in real-world scenarios. - Generative prior approaches (StableSR, DiffBIR, etc.): leverage pre-trained diffusion models to recover rich textures, but in all-in-one settings tend to misidentify mild degradations as severe ones, generating details inconsistent with the original image.

Key Challenge: Degradation discrimination and generative priors have developed independently, lacking a unified framework to organically integrate the two.

Goal: Inject fine-grained degradation discrimination capability into diffusion models while preserving their generative priors, enabling adaptive output adjustment based on degradation severity.

Key Insight: Propose a novel degradation representation (MAS-GLCM) combined with a three-stage diffusion training strategy to progressively bridge discriminative information into the generative process.

Core Idea: Use gray-level co-occurrence matrices as content-agnostic degradation representations, and align them with diffusion model features through three-stage training to unify discrimination and generation.

Method¶

Overall Architecture¶

Three-stage training pipeline: (1) Generative pre-training — learns VE-SDE denoising on high-quality images; (2) Bridging stage — introduces residual conditioning and aligns MAS-GLCM features with diffusion intermediate features via contrastive learning; (3) Restoration fine-tuning — enables all conditions (residual + LQ image) and applies L1 loss to enhance fidelity.

Key Designs¶

Multi-Angle Multi-Scale Gray-Level Co-occurrence Matrix (MAS-GLCM):
- Function: Extracts content-agnostic degradation features from degraded images.
- Mechanism: Standard GLCM computes co-occurrence frequencies of pixel pairs at specific distances and orientations; MAS-GLCM averages GLCMs computed across multiple angles \(\Theta\) and scales \(L\): \(M_{mas} = \frac{1}{n \times m} \sum_{i,j} M_{L_i \cdot \sin(\Theta_j), L_i \cdot \cos(\Theta_j)}\)
- Design Motivation: GLCM computation naturally discards image content (only aggregating gray-level co-occurrences), avoiding the content-coupling problem of Sobel or frequency-based methods. Multi-angle and multi-scale coverage mitigates locality bias.
- Experimental Validation: KNN accuracy of 97.13% on degradation type classification (vs. Fourier 65.80%); 74.17% on degradation level classification (vs. Fourier 30.83%).
Three-Stage Diffusion Training:
- Base formula: \(x_{t-1} = x_t - \alpha_t x_{res}^\theta - \frac{\beta_t^2}{\bar{\beta}_t} \epsilon^\theta + \delta_t x_{lq}\)
- Generation stage: \(\alpha_t \equiv 0, \delta_t \equiv 0\), reducing to VE-SDE denoising for pure generative prior learning.
- Bridging stage: \(\delta_t \equiv 0\); residual \(x_{res}\) is introduced as a condition carrying degradation information. MAS-GLCM features \(F_{mas}\) are aligned with diffusion intermediate features \(F_{diff}\) via bidirectional cross-entropy, while an MLP performs degradation classification to ensure \(F_{mas}\) retains discriminative capacity.
- Restoration stage: All parameters \(\alpha_t, \beta_t, \delta_t\) are activated; \(x_{lq}\) is directly injected to enhance fidelity. Degradation classification is replaced by all-negative-pair contrastive learning to accommodate real-world scenarios.
Degradation-Generation Bridging Loss:
- Function: Aligns GLCM features and diffusion features during the bridging stage.
- Mechanism: \(\mathcal{L}_{bridge} = \frac{1}{2}[\text{H}(y^{m2d}, p^{m2d}) + \text{H}(y^{d2m}, p^{d2m})]\), bidirectional cross-entropy alignment.
- Total loss: \(\mathcal{L}_{bdg} = \mathcal{L}_{gen} + \lambda(\mathcal{L}_{bridge} + \mathcal{L}_{deg-cls})\), with \(\lambda = 0.1\).

Loss & Training¶

Generation stage: Standard denoising loss (noise prediction + residual prediction).
Bridging stage: Denoising + feature alignment + degradation classification.
Restoration stage: L1 fidelity loss + bridging feature alignment loss + all-negative contrastive learning.
Real-world degradation is simulated via the multi-step degradation pipeline of Real-ESRGAN, with 8 intermediate states defined as pseudo-labels for "degradation order."

Key Experimental Results¶

Main Results — All-in-One Restoration¶

Method	Approach	Fidelity (PSNR↑)	Perceptual Quality (LPIPS↓)
PromptIR	Discriminative	High	Poor (over-smoothed)
DiffBIR	Generative	Low (inconsistent)	Good
BDG	Discrimination + Generation Unified	Significantly improved	Maintained

Ablation Study — MAS-GLCM Degradation Classification¶

Degradation Representation	Type Classification Acc (%)	Level Classification Acc (%)
LQ Image	51.44	20.00
Sobel (gradient)	40.80	23.33
Laplace (gradient)	83.05	20.83
Fourier (frequency)	65.80	30.83
MAS-GLCM	97.13	74.17

Key Findings¶

MAS-GLCM substantially outperforms existing degradation representations: surpasses Fourier by 43 percentage points on fine-grained degradation level classification.
All three training stages are indispensable: removing the bridging stage and transitioning directly from generation to restoration causes a significant fidelity drop.
Generative priors are successfully preserved: restoration results exhibit texture richness comparable to purely generative models while achieving substantially higher fidelity.
No architectural modifications required: BDG achieves improvements solely through training strategy and loss design, without altering the network structure.

Highlights & Insights¶

Content-agnostic nature of MAS-GLCM is the most prominent contribution: the GLCM computation inherently excludes content information, retaining only texture statistics — thereby decoupling degradation discrimination from image semantics.
Elegance of the three-stage transition design: generation → bridging → restoration progressively introduces parameters in the diffusion formula to govern the evolution of model capabilities — from pure generation to conditional generation to restoration.
Reformulation of degradation classification as contrastive learning: the bridging stage uses discrete class labels, while the restoration stage adopts all-negative contrastive learning — accommodating the practical reality that real-world degradations cannot always be explicitly categorized.

Limitations & Future Work¶

Angle and scale parameters of MAS-GLCM require manual selection.
Three-stage training increases overall training complexity.
"Order classification" for real-world degradation relies on approximate pseudo-labels that do not fully reflect true degradation processes.
Validation on temporal scenarios such as video restoration has not been conducted.

vs. PromptIR/AirNet: These methods employ auxiliary networks for degradation discrimination but produce overly smooth outputs; BDG performs discrimination internally within the diffusion model.
vs. DiffBIR/StableSR: These methods exploit generative priors but lack degradation awareness, resulting in poor fidelity in all-in-one settings; BDG addresses this through MAS-GLCM bridging.
vs. DiffUIR: DiffUIR predicts only residuals, sacrificing generative priors; BDG simultaneously predicts both noise and residuals, preserving generative priors.

Rating¶

Novelty: ⭐⭐⭐⭐ MAS-GLCM degradation representation is original; three-stage training design demonstrates substantial depth.
Experimental Thoroughness: ⭐⭐⭐⭐ Covers degradation classification validation, all-in-one restoration, and real-world super-resolution, though quantitative comparisons with additional baselines are limited.
Writing Quality: ⭐⭐⭐⭐ Method derivation is clear, though the dense notation requires careful tracking.
Value: ⭐⭐⭐⭐ The unified discrimination-generation framework offers practically meaningful guidance for the image restoration community.