Linearly Constrained Diffusion Implicit Models¶

Conference: NeurIPS 2025 arXiv: 2411.00359 Code: https://grail.cs.washington.edu/projects/cdim/ Area: 3D Vision / Image Restoration Keywords: diffusion models, inverse problems, linear constraints, DDIM, accelerated sampling

TL;DR¶

This paper proposes CDIM, a DDIM-based algorithm for solving linear inverse problems. By aligning the residual energy with the \(\chi^2\) distribution of the forward diffusion process, CDIM adaptively controls the number and step size of projection steps, achieving inference speeds 10–50× faster than DPS while exactly satisfying measurement constraints in the noiseless case.

Background & Motivation¶

Recovering a signal \(\mathbf{x}\) from noisy linear measurements \(\mathbf{y} = \mathbf{A}\mathbf{x} + \boldsymbol{\sigma}\) is a fundamental problem encompassing tasks such as super-resolution, inpainting, and denoising. Using pretrained diffusion models as priors to solve such inverse problems has become a prominent research direction.

Limitations of Prior Work:

Excessive projection steps: Methods such as DPS alternate between unconditional denoising and constraint projection at each diffusion step, but each projection requires a network forward pass, resulting in extremely slow inference (>70 seconds).

Constraint failure under accelerated sampling: Directly combining DPS with DDIM accelerated sampling leads to insufficient projection due to large step sizes, producing blurry or divergent outputs.

Inability to exactly satisfy constraints: DPS employs fixed small projection step sizes, which cannot guarantee \(\mathbf{A}\hat{\mathbf{x}}_0 = \mathbf{y}\) even in the noiseless setting.

Difficulty in step size selection: Projection step sizes that are too large cause divergence or out-of-distribution drift, while those that are too small result in insufficient convergence.

Key Challenge: How can the number of projection steps be substantially reduced without either over-projecting (pulling samples out of distribution) or under-projecting (failing to satisfy constraints)?

Core Idea: The analytic \(\chi^2\) distribution of the residual energy \(\|\mathbf{A}\mathbf{x}_t - \mathbf{y}\|^2\) under the forward diffusion process is leveraged to guide the projection strategy — projection is performed only when the residual exceeds a "plausible region," and the step size is determined via one-dimensional search so that the residual lands precisely on the boundary of this region.

Method¶

Overall Architecture¶

The CDIM inference procedure (Algorithm 1): 1. Initialize \(\mathbf{x}_T \sim \mathcal{N}(0, \mathbf{I})\) 2. For each timestep \(t = T, T-\delta, \ldots, 1\): - Perform one unconditional DDIM denoising step to obtain \(\mathbf{x}_{t-\delta}\) - Compute the plausible region boundary \(\rho_t(\mathbf{y}) = \mu_{t-\delta}(\mathbf{y}) + c \cdot \sigma_{t-\delta}(\mathbf{y})\) - While \(\|\mathbf{A}\mathbf{x}_{t-\delta} - \mathbf{y}\|^2 > \rho_t\): compute the Tweedie estimate \(\hat{\mathbf{x}}_0\), compute gradient \(\mathbf{g}\), find optimal step size \(\eta^*\) via one-dimensional search, and update \(\mathbf{x}_{t-\delta}\)

Key Designs¶

Surrogate Residual and \(\chi^2\) Distribution:
- Function: Provides an analytically computable stopping criterion for projection.
- Mechanism: The distribution of the true objective \(L_t = \|\mathbf{A}\hat{\mathbf{x}}_0(\mathbf{x}_t) - \mathbf{y}\|^2\) is intractable (depending on the data distribution). However, the surrogate residual \(R_t = \|\mathbf{A}\mathbf{x}_t - \mathbf{y}\|^2\) follows a non-central generalized \(\chi^2\) distribution under the forward process, with analytically computable mean and variance:
  - \(\mu_t(\mathbf{y}) = (\sqrt{\bar\alpha_t} - 1)^2 \|\mathbf{y}\|^2 + (1 - \bar\alpha_t) \text{tr}(\mathbf{A}\mathbf{A}^\top)\)
  - \(\sigma_t^2(\mathbf{y})\) is similarly computable in closed form.
- Proposition 1: When \(|R_t - \mathbb{E}[R_t|\mathbf{y}]| \leq \gamma\), it follows that \(|L_t - \mathbb{E}[L_t|\mathbf{y}]| \leq \gamma/\bar\alpha_t + O(\sqrt{1-\bar\alpha_t}/\bar\alpha_t)\). That is, controlling the surrogate residual implies control over the true objective.
- Design Motivation: Computing \(R_t\) requires only matrix multiplication without network evaluation, enabling step size search at zero additional NFE (Neural Function Evaluations).
Adaptive Step Size Selection:
- Function: Identifies the optimal projection step size without additional NFEs.
- Mechanism: The gradient \(\mathbf{g} = \nabla_{\mathbf{x}_{t-\delta}} \|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|^2\) is precomputed, and a one-dimensional search finds \(\eta^* = \arg\min_\eta |\rho_{t-\delta}(\mathbf{y}) - \|\mathbf{A}(\mathbf{x}_{t-\delta} - \eta\mathbf{g}) - \mathbf{y}\|^2|\).
- The objective is quadratic in \(\eta\), making the search efficient and stable.
- Design Motivation: Pulling the residual to the boundary of the plausible region (rather than its center) yields better empirical results.
Stopping Criterion and Hyperparameter \(c\):
- Function: Determines when to stop projection.
- Mechanism: Projection halts when \(\|\mathbf{A}\mathbf{x}_{t-\delta} - \mathbf{y}\|^2 \leq \rho_{t-\delta}(\mathbf{y})\).
- Smaller \(c\) (narrower plausible region) yields better results but requires more projection steps. Experiments use \(c = 0.1\).
- At high noise levels the plausible region is large and almost no projection is needed; at low noise levels the region narrows and more projection steps are required.
Poisson Noise Handling:
- Function: Extends the framework to non-Gaussian noise.
- Mechanism: Pearson residuals \(R(\mathbf{A}\hat{\mathbf{x}}_0, \mathbf{y}) = \frac{\lambda(\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0)}{\sqrt{\lambda\hat{\mathbf{x}}_0}}\) transform Poisson noise into approximately standard normal noise, after which the same framework is applied.

Loss & Training¶

No training is required; a pretrained diffusion model is used directly.
Only two hyperparameters: number of denoising steps \(T'\) and stopping criterion constant \(c\).
Defaults: \(T' = 50\), \(c = 0.1\).
As \(t \to 0\), \(\hat{\mathbf{x}}_0 \to \mathbf{x}_t\) and \(L_t\) becomes a simple convex quadratic, enabling exact convergence to zero.

Key Experimental Results¶

Main Results¶

FFHQ 256×256 (\(\sigma=0.05\)):

Method	Super-res FID	Inpainting (box) FID	Gaussian Deblur FID	Inpainting (random) FID	Time (s)
DPS	39.35	33.12	44.05	21.19	70.42
FPS-SMC	26.62	26.51	29.97	33.10	116.90
DDRM	62.15	42.93	74.92	69.71	2.0
CDIM (T'=25)	33.87	27.51	34.18	29.67	2.4
CDIM (T'=50)	31.54	26.09	29.68	28.52	6.4

CDIM achieves comparable or superior quality while running ~28× faster than DPS (2.4s vs. 70.4s).

Ablation Study¶

Denoising Steps T'	Total Projection Steps	Quality	Notes
T'=25	31	Good	Default fast setting
T'=10	15	Reasonable	Still usable
T'=4	10	Acceptable	<1s inference, 14 NFEs

Stopping Criterion \(c\)	Quality	Projection Steps
c=4	Poor constraint satisfaction	Fewest
c=1	Good	Moderate
c=0.1	Best	More but manageable

DPS + DDIM Comparison: - DPS + DDIM (small step size): MAE 4%, constraint unsatisfied, blurry hair - DPS + DDIM (large step size): MAE 0.3%, constraint improved but results diverge - CDIM: MAE 0.05%, constraints exactly satisfied with natural-looking outputs

Key Findings¶

Exact constraint satisfaction: In the noiseless setting, CDIM exactly satisfies \(\mathbf{A}\mathbf{x}_0 = \mathbf{y}\), even for out-of-distribution observations.
Acceleration is not solely from DDIM: DPMC employs DDIM-style sampling yet still requires 200 additional NFEs, demonstrating that DDIM alone is insufficient.
Fewer projections at high noise, more at low noise: This is intuitive — early-stage Tweedie estimates are inaccurate and should not be over-projected.
Extreme acceleration remains viable: As few as T'=4 steps (<1 second) still produces reasonable results.

Highlights & Insights¶

Elegant mathematical framework: The analytic distribution of the forward process is used to guide projection during reverse sampling, providing clear theoretical motivation.
Clever surrogate residual design: \(R_t\) can be computed without network evaluation, making step size search cost-free in terms of NFEs.
Exact constraint satisfaction: CDIM is the first diffusion-based inverse problem solver to achieve exact constraint satisfaction (which DPS cannot).
Poisson noise extension: Pearson residuals provide a unified treatment of non-Gaussian noise within the same framework.

Limitations & Future Work¶

Restricted to linear constraints: For nonlinear constraints, \(\mathbb{E}[h(\mathbf{x}_0)] \neq h(\mathbb{E}[\mathbf{x}_0])\), precluding extension of the Tweedie estimator.
Requires an explicit form of \(\mathbf{A}\) to compute \(\|\mathbf{A}\mathbf{x}_t - \mathbf{y}\|^2\).
For Poisson noise, the normal approximation of Pearson residuals breaks down under extreme noise levels or when \(\hat{\mathbf{x}}_0\) approaches zero.
Computing the gradient \(\mathbf{g}\) still requires a network evaluation; the method substantially reduces but does not eliminate such calls.

DPS (2023): The primary baseline for CDIM; DPS's small per-step projection strategy fails under accelerated sampling.
DDRM/DDNM: SVD-based methods are fast but achieve limited quality.
DSG: Employs similar gradient-based updates but uses soft rather than exact constraints.
Insight: The statistical properties of the forward process constitute a valuable source of guidance for the reverse process; replacing heuristic strategies with analytic distributions is a promising direction.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Using the \(\chi^2\) distribution to guide projection is an elegant and original contribution.
Experimental Thoroughness: ⭐⭐⭐⭐ Multi-task, multi-dataset comparisons with ablations and novel applications (temporal rephotography / point clouds), though quantitative evaluation could be further enriched.
Writing Quality: ⭐⭐⭐⭐⭐ Mathematical derivations are rigorous and clear; figures are intuitive.
Value: ⭐⭐⭐⭐⭐ Opens a new Pareto frontier in the speed–quality trade-off for diffusion-based inverse problem solving.