DIIP: Diffusion Image Prior¶

Conference: ICCV 2025 arXiv: 2503.21410 Code: None Area: Image Generation / Image Restoration Keywords: diffusion model, Blind Image Restoration, Deep Image Prior, Early Stopping, Zero-shot

TL;DR¶

This paper discovers that pretrained diffusion models exhibit an implicit bias analogous to Deep Image Prior (DIP) when reconstructing degraded images—the iterative optimization first produces a clean image before overfitting to the degraded input—and that this bias generalizes to a broader range of degradation types than DIP. Based on this finding, the authors propose DIIP, a fully blind (degradation-model-free) image restoration method.

Background & Motivation¶

Image restoration (IR) aims to recover a clean image \(x\) from a degraded observation \(y\). Existing methods can be categorized by the strength of their degradation model assumptions:

Non-blind methods (DDRM, DDNM, DPS): require a fully known degradation model (e.g., known blur kernel)

Partially blind methods (BlindDPS, BIRD): require knowledge of the parametric form of the degradation

Fully blind methods (DIP, DreamClean): require no knowledge of the degradation model

Deep Image Prior (DIP) is a classical fully blind approach that exploits the implicit prior of CNNs by optimizing network parameters to fit a degraded image, producing clean outputs at intermediate iterations. However, its core limitation is that this "clean-before-overfitting" behavior holds only for high-frequency degradations (e.g., noise) and fails for low-frequency degradations (e.g., blur).

The authors pose the key research question: do pretrained diffusion models exhibit a similar bias, and if so, does it generalize beyond the scope of DIP?

Method¶

Overall Architecture¶

DIIP employs a frozen pretrained diffusion model \(g\) (via DDIM deterministic mapping) and optimizes the input noise \(z\) rather than the model parameters:

\[z^* = \arg\min_z \|g(z) - y\|^2, \quad x^* = g(z^*)\]

Gradient descent is used to iteratively optimize \(z\), and early stopping is applied at the appropriate iteration to obtain the clean restored image.

Key Designs¶

Discovery and Verification of the Diffusion Model's Implicit Prior:
- Gaussian noise and Gaussian blur are respectively applied to FFHQ images, and the optimization above is run to convergence (1500 iterations).
- Two key findings emerge:
  - (a) Two-phase behavior: Regardless of the degradation type (noise or blur), the optimization exhibits two phases—(I) an intermediate phase in which a clean, realistic image is produced; and (II) a late phase in which the model begins to overfit to the degraded input. In contrast, DIP fails to produce sharp images at intermediate iterations under blur degradation.
  - (b) High-frequency inertia: Similar to DIP, diffusion models exhibit strong resistance to high-frequency artifacts and begin overfitting to noise only at very late iterations.
- Design Motivation: This finding extends the applicable scope of DIP-like behavior from "high-frequency degradations only" to "a wide range of degradations including low-frequency ones."
Stopping Criterion for Low-Frequency Degradations (Laplacian Variance):
- The Laplacian variance (LV) of the reconstructed image is monitored as a sharpness indicator.
- During phase (I), the generated image is sharp and LV is high.
- When \(k > k_{min}\) and \(\sigma^2[k+1] < \sigma^2[k]\), the image begins to blur, triggering early stopping.
- The reconstruction corresponding to the last LV peak is returned.
- Design Motivation: Absolute sharpness is difficult to measure, but the relative trend of sharpness change is reliable.
Stopping Criterion for High-Frequency Degradations (Normalized Loss Slope):
- The normalized slope \(\Delta_k = \frac{E(z^k;y) - E(z^{k-1};y)}{E(z^{k-1};y)}\) is computed.
- Experiments show that the minimum of \(\Delta_k\) corresponds precisely to the maximum PSNR.
- Optimization stops when \(\Delta_k < \epsilon\) (default 0.001).
- Design Motivation: The inflection point in the loss curve marks the transition from "learning the clean signal" to "fitting the noise."

Loss & Training¶

Optimization uses the Adam optimizer with a learning rate of 0.0015.
Hyperparameters: \(k_{min} = 100\), \(\epsilon = 0.001\).
The fast diffusion inversion method proposed in BIRD is adopted to accelerate the computation of \(g(z)\).
The pretrained model is an unconditional diffusion model with a UNet backbone.
No training dataset is required; the method is purely test-time optimization.

Key Experimental Results¶

Main Results¶

CelebA Structured Degradations (Denoising, Super-Resolution):

Method	Denoising PSNR↑	Denoising SSIM↑	SR×4 PSNR↑	SR×8 PSNR↑
DIP	25.81	0.606	21.33	20.34
BIRD	27.92	0.821	25.26	22.63
DreamClean	27.05	0.771	23.44	21.33
DIIP	28.37	0.842	25.14	22.86

Unstructured Degradations (Fully Blind Setting):

Method	JPEG PSNR↑	Raindrop Removal PSNR↑	Non-uniform Deformation PSNR↑
DIP	20.43	20.37	18.83
DreamClean	23.92	22.94	22.16
DIIP	25.29	23.78	23.45

Note: Partially blind methods such as BIRD and BlindDPS cannot be applied to unstructured degradation scenarios.

Ablation Study¶

Effect of \(k_{min}\) (PSNR in dB):

\(k_{min}\)	Non-uniform Deformation	Raindrop Removal
50	22.18	22.48
100	23.45	23.78
150	23.52	23.82

Effect of \(\epsilon\) (PSNR in dB):

\(\epsilon\)	Denoising	JPEG Artifact Removal
0.005	27.25	22.38
0.001	28.37	25.29
0.0005	28.14	25.02

Gap from Oracle Stopping: DIIP trails the oracle by approximately 0.3 dB (denoising: 28.37 vs. 28.63), demonstrating that the self-supervised stopping criteria are near-optimal.

Key Findings¶

DIIP achieves state-of-the-art performance on all fully blind restoration tasks, and even surpasses some partially blind methods that require explicit degradation models on certain tasks.
Runtime is 138 seconds per image with 1.2 GB memory, comparable to DreamClean.
The pretrained diffusion model has never been exposed to degraded data, yet consistently reconstructs clean images first—a manifestation of a purely inductive bias.
The two stopping criteria are complementary and apply to different degradation types; the algorithm automatically selects whichever criterion triggers first, without requiring prior knowledge of the degradation type.

Highlights & Insights¶

The core finding is highly valuable: the implicit prior of diffusion models is stronger and more broadly applicable than that of DIP, opening a new direction for fully blind image restoration.
The method is remarkably simple and elegant: frozen model + noise optimization + early stopping, with no additional training or degradation modeling required.
The two self-supervised stopping criteria are complementary and require no knowledge of the degradation type.
In experiments, DreamClean tends to alter image identity (e.g., facial features), whereas DIIP better preserves the original content.

Limitations & Future Work¶

Computational overhead is substantial (~138 seconds per image), making the method unsuitable for real-time applications.
Each image requires independent optimization, precluding batch processing.
The stopping criteria exhibit some sensitivity to hyperparameters (\(k_{min}\), \(\epsilon\)).
Validation is limited to 256×256 resolution; high-resolution scenarios would require more efficient diffusion inversion schemes.
The interaction between the two stopping criteria under composite degradations involving both high-frequency and low-frequency components has not been thoroughly analyzed.

DIP is the direct source of inspiration; DIIP can be viewed as an upgrade of DIP instantiated on diffusion models.
DreamClean is the closest competitor (also fully blind) but adopts a different diffusion inference strategy.
The fast diffusion inversion technique from BIRD is borrowed to accelerate optimization.
The core finding of this paper may transfer to other generative models—whether Flow Matching, Consistency Models, and related frameworks possess analogous implicit priors warrants future investigation.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — The discovery of the diffusion model's implicit prior is highly insightful, and the method is concise and original.
Experimental Thoroughness: ⭐⭐⭐⭐ — A diverse range of degradation types is covered with thorough ablations; however, the evaluation datasets are relatively small in scale.
Writing Quality: ⭐⭐⭐⭐⭐ — Motivation is clearly articulated, and the logical chain from discovery to method is highly natural.
Value: ⭐⭐⭐⭐ — Offers a new perspective for fully blind image restoration, though computational cost limits practical deployment.