BlurDM: A Blur Diffusion Model for Image Deblurring¶

Conference: NeurIPS 2025 arXiv: 2512.03979 Code: https://jin-ting-he.github.io/BlurDM/ Area: Image Generation Keywords: image deblurring, blur diffusion, dual diffusion, motion blur, prior generation

TL;DR¶

BlurDM integrates the physical formation process of motion blur (progressive blur accumulation due to continuous exposure) into a diffusion model via a dual forward process (simultaneous noise addition and blurring) and a dual denoising-deblurring reverse process. It serves as a latent-space prior generator that consistently enhances four deblurring methods across four datasets, achieving an average gain of +0.31 dB on GoPro and +0.78 dB on RealBlur-J, while adding only ~4 GFLOPs and ~9 ms.

Background & Motivation¶

Background: Deep learning-based deblurring methods (CNN/Transformer) are constrained by regression losses, producing overly smooth results. Diffusion models generate rich details, but their standard noise-based forward process is physically mismatched with the motion blur formation process.

Limitations of Prior Work: Motion blur arises from the structured, directional accumulation of continuous exposure — \(B = \frac{1}{\alpha_T}\int_0^{\alpha_T} H(\tau)d\tau\) — rather than the isotropic Gaussian noise perturbation of standard diffusion. Directly applying DDPM as a deblurring prior yields only +0.13 dB, which is nearly ineffective.

Key Challenge: A fundamental mismatch exists between the standard diffusion process (adding Gaussian noise) and the blur formation process (adding directional blur) — noise is stochastic while blur is structured.

Goal: Design a physically grounded diffusion process whose forward pass mimics blur formation and whose reverse pass naturally performs deblurring.

Key Insight: Decompose the blur formulation into a cumulative form — \(I_t = \frac{\alpha_{t-1}}{\alpha_t}I_{t-1} + \frac{1}{\alpha_t}e_t + \beta_t\epsilon_t\) — where the first two terms represent progressive blurring and the last term represents noise, naturally yielding a dual diffusion process.

Core Idea: Design a diffusion model whose forward process simultaneously adds noise and blur, and whose reverse process simultaneously performs denoising and deblurring, operating in latent space as a general-purpose prior to enhance arbitrary deblurring networks.

Method¶

Overall Architecture¶

The training pipeline consists of three stages: (1) pre-train the Sharp Encoder, Prior Fusion Module (PFM), and deblurring network using GT sharp images to provide an "upper-bound" prior; (2) train the Blur Encoder and BlurDM to recover sharp priors from blurry latent representations; (3) joint optimization — inject BlurDM-generated priors into the deblurring network via PFM for end-to-end fine-tuning.

Key Designs¶

Dual Forward Diffusion Process:
- Function: Simultaneously adds Gaussian noise and blur residuals to the image.
- Core formulation: \(I_t = \frac{\alpha_{t-1}}{\alpha_t}I_{t-1} + \frac{1}{\alpha_t}e_t + \beta_t\epsilon_t\)
- Here \(e_t = \int_{\alpha_{t-1}}^{\alpha_t} H(\tau)d\tau\) denotes the accumulated blur residual over the time interval \([\alpha_{t-1}, \alpha_t]\).
- Terminal state: \(q(I_T|I_0, e_{1:T}) = \mathcal{N}(I_T; \frac{\alpha_0}{\alpha_T}I_0 + \frac{1}{\alpha_T}\sum e_t, \bar{\beta}_T^2\mathbf{I})\)
- Physical interpretation: Simulates the continuous exposure process from shutter open to shutter close.
Dual Denoising-Deblurring Reverse Process:
- Function: Learns two estimators — a blur residual estimator \(e^\theta\) and a noise estimator \(\epsilon^\theta\) — to simultaneously perform deblurring and denoising.
- Reverse step: \(I_{t-1} = \frac{\alpha_t}{\alpha_{t-1}}I_t - \frac{1}{\alpha_{t-1}}e^\theta(I_t,t,B) - (\frac{\alpha_t\bar{\beta}_t}{\alpha_{t-1}} - \bar{\beta}_{t-1})\epsilon^\theta(I_t,t,B)\)
- Conditioning: Conditioned on the blurry image \(B\) to guide the deblurring direction.
Latent BlurDM Architecture:
- Function: Runs BlurDM in latent space as a flexible prior generator.
- Stage 1: Pre-trains the Sharp Encoder to extract GT sharp priors \(Z^S\); PFM modulates decoder features via affine parameters: \(F_i' = Z^{S,\alpha_i} \times F_i + Z^{S,\beta_i}\)
- Stage 2: BlurDM recovers \(Z^S\) (sharp prior) from \(Z^B\) (blurry latent + noise) in latent space, with loss \(\mathcal{L}_{prior} = \|Z_0^B - Z^S\|_1\)
- Stage 3: Joint optimization of BlurDM + PFM + deblurring network.
- Design Motivation: The three-stage design ensures BlurDM learns a meaningful prior; training only in Stage 3 jointly yields substantially worse results.

Computational Overhead¶

BlurDM adds only ~4.16 GFLOPs (<8% overhead), ~3.33M parameters, and ~9 ms inference time. \(T=5\) steps is the optimal number of iterations.

Key Experimental Results¶

Main Results¶

BlurDM as a plug-and-play module improves PSNR across four deblurring methods on four datasets:

Dataset	MIMO-UNet	Stripformer	FFTformer	LoFormer	Avg.
GoPro	+0.49	+0.44	+0.13	+0.16	+0.31
HIDE	+0.73	+0.33	+0.14	+0.09	+0.32
RealBlur-J	+0.54	+1.05	+0.30	+1.24	+0.78
RealBlur-R	+0.60	+1.16	+0.44	+0.56	+0.69

Ablation Study¶

Configuration	GoPro PSNR↑
Baseline (no prior)	31.78
+ DDPM prior	31.91 (+0.13)
+ RDDM residual diffusion	32.03 (+0.25)
+ BlurDM	32.28 (+0.50)

Necessity of three-stage training:

Configuration	PSNR
Stage 3 only	31.80
Stage 1+2	32.01
Stage 1+3	31.95
Stage 1+2+3	32.28
Oracle (GT prior)	32.69 (upper bound)

Key Findings¶

Standard DDPM prior is nearly ineffective (+0.13 dB only), validating the mismatch between noise diffusion and blur physics.
BlurDM outperforms DDPM by +0.37 dB and RDDM (residual diffusion) by +0.25 dB — explicit blur residual modeling is more effective than implicit residual modeling.
Real-world blur datasets benefit more: RealBlur-J averages +0.78 dB vs. +0.31 dB on GoPro — real blur relies more on physical priors.
\(T=5\) is optimal; gains diminish for \(T \geq 6\).
All three training stages are indispensable — Stage 2 prior pre-training and Stage 3 joint fine-tuning are both necessary.

Highlights & Insights¶

Natural mapping from blur physics to diffusion process: Continuous exposure → progressive blurring ≈ diffusion forward process. This methodology of deriving mathematical formulations from physical processes is worth drawing upon.
Dual estimator design: Simultaneously estimating blur residuals and noise separates structured degradation from stochastic degradation, achieving higher precision than a single estimator.
Architecture-agnostic prior generator: Rather than replacing existing deblurring methods, BlurDM serves as a universal prior enhancer — all four architectures (CNN/Transformer) benefit consistently.
Necessity of three-stage training: Direct joint training performs poorly (31.80 vs. 32.28); learning the prior separately before joint optimization is critical.

Limitations & Future Work¶

Designed specifically for motion blur — defocus blur is a depth-dependent, non-temporal accumulation process and is not applicable.
The blur accumulation model is an approximation and may be inaccurate for non-standard motion blur (e.g., rotational motion, non-rigid body motion).
The stochasticity of diffusion models may affect content fidelity.
No comparison with recent flow-matching-based methods.

vs. Standard diffusion-based deblurring (DvSR, DiffIR): These methods use standard noise diffusion without exploiting blur physics. BlurDM achieves better priors through physical modeling.
vs. RDDM (residual diffusion): RDDM implicitly models residuals, while BlurDM explicitly models blur accumulation, yielding better performance (+0.25 dB).
Implications as a general framework: Similar ideas can be applied to diffusion modeling of other physical degradation processes, such as compression artifacts and rain streak degradation.

Rating¶

Novelty: ⭐⭐⭐⭐ Integrating blur physics into the diffusion process is an elegant innovation.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ 4 methods × 4 datasets + detailed ablations + three-stage analysis.
Writing Quality: ⭐⭐⭐⭐ Physical motivation is clear and derivations are complete.
Value: ⭐⭐⭐⭐ A plug-and-play universal deblurring enhancement tool.