Linearly Constrained Diffusion Implicit Models¶
Conference: NeurIPS 2025 arXiv: 2411.00359 Code: https://grail.cs.washington.edu/projects/cdim/ Area: 3D Vision / Image Restoration Keywords: diffusion models, inverse problems, linear constraints, DDIM, accelerated sampling
TL;DR¶
This paper proposes CDIM, a DDIM-based algorithm for solving linear inverse problems. By aligning the residual energy with the \(\chi^2\) distribution of the forward diffusion process, CDIM adaptively controls the number and step size of projection steps, achieving inference speeds 10–50× faster than DPS while exactly satisfying measurement constraints in the noiseless case.
Background & Motivation¶
Recovering a signal \(\mathbf{x}\) from noisy linear measurements \(\mathbf{y} = \mathbf{A}\mathbf{x} + \boldsymbol{\sigma}\) is a fundamental problem encompassing tasks such as super-resolution, inpainting, and denoising. Using pretrained diffusion models as priors to solve such inverse problems has become a prominent research direction.
Limitations of Prior Work:
Excessive projection steps: Methods such as DPS alternate between unconditional denoising and constraint projection at each diffusion step, but each projection requires a network forward pass, resulting in extremely slow inference (>70 seconds).
Constraint failure under accelerated sampling: Directly combining DPS with DDIM accelerated sampling leads to insufficient projection due to large step sizes, producing blurry or divergent outputs.
Inability to exactly satisfy constraints: DPS employs fixed small projection step sizes, which cannot guarantee \(\mathbf{A}\hat{\mathbf{x}}_0 = \mathbf{y}\) even in the noiseless setting.
Difficulty in step size selection: Projection step sizes that are too large cause divergence or out-of-distribution drift, while those that are too small result in insufficient convergence.
Key Challenge: How can the number of projection steps be substantially reduced without either over-projecting (pulling samples out of distribution) or under-projecting (failing to satisfy constraints)?
Core Idea: The analytic \(\chi^2\) distribution of the residual energy \(\|\mathbf{A}\mathbf{x}_t - \mathbf{y}\|^2\) under the forward diffusion process is leveraged to guide the projection strategy — projection is performed only when the residual exceeds a "plausible region," and the step size is determined via one-dimensional search so that the residual lands precisely on the boundary of this region.
Method¶
Overall Architecture¶
The CDIM inference procedure (Algorithm 1): 1. Initialize \(\mathbf{x}_T \sim \mathcal{N}(0, \mathbf{I})\) 2. For each timestep \(t = T, T-\delta, \ldots, 1\): - Perform one unconditional DDIM denoising step to obtain \(\mathbf{x}_{t-\delta}\) - Compute the plausible region boundary \(\rho_t(\mathbf{y}) = \mu_{t-\delta}(\mathbf{y}) + c \cdot \sigma_{t-\delta}(\mathbf{y})\) - While \(\|\mathbf{A}\mathbf{x}_{t-\delta} - \mathbf{y}\|^2 > \rho_t\): compute the Tweedie estimate \(\hat{\mathbf{x}}_0\), compute gradient \(\mathbf{g}\), find optimal step size \(\eta^*\) via one-dimensional search, and update \(\mathbf{x}_{t-\delta}\)
Key Designs¶
-
Surrogate Residual and \(\chi^2\) Distribution:
- Function: Provides an analytically computable stopping criterion for projection.
- Mechanism: The distribution of the true objective \(L_t = \|\mathbf{A}\hat{\mathbf{x}}_0(\mathbf{x}_t) - \mathbf{y}\|^2\) is intractable (depending on the data distribution). However, the surrogate residual \(R_t = \|\mathbf{A}\mathbf{x}_t - \mathbf{y}\|^2\) follows a non-central generalized \(\chi^2\) distribution under the forward process, with analytically computable mean and variance:
- \(\mu_t(\mathbf{y}) = (\sqrt{\bar\alpha_t} - 1)^2 \|\mathbf{y}\|^2 + (1 - \bar\alpha_t) \text{tr}(\mathbf{A}\mathbf{A}^\top)\)
- \(\sigma_t^2(\mathbf{y})\) is similarly computable in closed form.
- Proposition 1: When \(|R_t - \mathbb{E}[R_t|\mathbf{y}]| \leq \gamma\), it follows that \(|L_t - \mathbb{E}[L_t|\mathbf{y}]| \leq \gamma/\bar\alpha_t + O(\sqrt{1-\bar\alpha_t}/\bar\alpha_t)\). That is, controlling the surrogate residual implies control over the true objective.
- Design Motivation: Computing \(R_t\) requires only matrix multiplication without network evaluation, enabling step size search at zero additional NFE (Neural Function Evaluations).
-
Adaptive Step Size Selection:
- Function: Identifies the optimal projection step size without additional NFEs.
- Mechanism: The gradient \(\mathbf{g} = \nabla_{\mathbf{x}_{t-\delta}} \|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|^2\) is precomputed, and a one-dimensional search finds \(\eta^* = \arg\min_\eta |\rho_{t-\delta}(\mathbf{y}) - \|\mathbf{A}(\mathbf{x}_{t-\delta} - \eta\mathbf{g}) - \mathbf{y}\|^2|\).
- The objective is quadratic in \(\eta\), making the search efficient and stable.
- Design Motivation: Pulling the residual to the boundary of the plausible region (rather than its center) yields better empirical results.
-
Stopping Criterion and Hyperparameter \(c\):
- Function: Determines when to stop projection.
- Mechanism: Projection halts when \(\|\mathbf{A}\mathbf{x}_{t-\delta} - \mathbf{y}\|^2 \leq \rho_{t-\delta}(\mathbf{y})\).
- Smaller \(c\) (narrower plausible region) yields better results but requires more projection steps. Experiments use \(c = 0.1\).
- At high noise levels the plausible region is large and almost no projection is needed; at low noise levels the region narrows and more projection steps are required.
-
Poisson Noise Handling:
- Function: Extends the framework to non-Gaussian noise.
- Mechanism: Pearson residuals \(R(\mathbf{A}\hat{\mathbf{x}}_0, \mathbf{y}) = \frac{\lambda(\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0)}{\sqrt{\lambda\hat{\mathbf{x}}_0}}\) transform Poisson noise into approximately standard normal noise, after which the same framework is applied.
Loss & Training¶
- No training is required; a pretrained diffusion model is used directly.
- Only two hyperparameters: number of denoising steps \(T'\) and stopping criterion constant \(c\).
- Defaults: \(T' = 50\), \(c = 0.1\).
- As \(t \to 0\), \(\hat{\mathbf{x}}_0 \to \mathbf{x}_t\) and \(L_t\) becomes a simple convex quadratic, enabling exact convergence to zero.
Key Experimental Results¶
Main Results¶
FFHQ 256×256 (\(\sigma=0.05\)):
| Method | Super-res FID | Inpainting (box) FID | Gaussian Deblur FID | Inpainting (random) FID | Time (s) |
|---|---|---|---|---|---|
| DPS | 39.35 | 33.12 | 44.05 | 21.19 | 70.42 |
| FPS-SMC | 26.62 | 26.51 | 29.97 | 33.10 | 116.90 |
| DDRM | 62.15 | 42.93 | 74.92 | 69.71 | 2.0 |
| CDIM (T'=25) | 33.87 | 27.51 | 34.18 | 29.67 | 2.4 |
| CDIM (T'=50) | 31.54 | 26.09 | 29.68 | 28.52 | 6.4 |
CDIM achieves comparable or superior quality while running ~28× faster than DPS (2.4s vs. 70.4s).
Ablation Study¶
| Denoising Steps T' | Total Projection Steps | Quality | Notes |
|---|---|---|---|
| T'=25 | 31 | Good | Default fast setting |
| T'=10 | 15 | Reasonable | Still usable |
| T'=4 | 10 | Acceptable | <1s inference, 14 NFEs |
| Stopping Criterion \(c\) | Quality | Projection Steps |
|---|---|---|
| c=4 | Poor constraint satisfaction | Fewest |
| c=1 | Good | Moderate |
| c=0.1 | Best | More but manageable |
DPS + DDIM Comparison: - DPS + DDIM (small step size): MAE 4%, constraint unsatisfied, blurry hair - DPS + DDIM (large step size): MAE 0.3%, constraint improved but results diverge - CDIM: MAE 0.05%, constraints exactly satisfied with natural-looking outputs
Key Findings¶
- Exact constraint satisfaction: In the noiseless setting, CDIM exactly satisfies \(\mathbf{A}\mathbf{x}_0 = \mathbf{y}\), even for out-of-distribution observations.
- Acceleration is not solely from DDIM: DPMC employs DDIM-style sampling yet still requires 200 additional NFEs, demonstrating that DDIM alone is insufficient.
- Fewer projections at high noise, more at low noise: This is intuitive — early-stage Tweedie estimates are inaccurate and should not be over-projected.
- Extreme acceleration remains viable: As few as T'=4 steps (<1 second) still produces reasonable results.
Highlights & Insights¶
- Elegant mathematical framework: The analytic distribution of the forward process is used to guide projection during reverse sampling, providing clear theoretical motivation.
- Clever surrogate residual design: \(R_t\) can be computed without network evaluation, making step size search cost-free in terms of NFEs.
- Exact constraint satisfaction: CDIM is the first diffusion-based inverse problem solver to achieve exact constraint satisfaction (which DPS cannot).
- Poisson noise extension: Pearson residuals provide a unified treatment of non-Gaussian noise within the same framework.
Limitations & Future Work¶
- Restricted to linear constraints: For nonlinear constraints, \(\mathbb{E}[h(\mathbf{x}_0)] \neq h(\mathbb{E}[\mathbf{x}_0])\), precluding extension of the Tweedie estimator.
- Requires an explicit form of \(\mathbf{A}\) to compute \(\|\mathbf{A}\mathbf{x}_t - \mathbf{y}\|^2\).
- For Poisson noise, the normal approximation of Pearson residuals breaks down under extreme noise levels or when \(\hat{\mathbf{x}}_0\) approaches zero.
- Computing the gradient \(\mathbf{g}\) still requires a network evaluation; the method substantially reduces but does not eliminate such calls.
Related Work & Insights¶
- DPS (2023): The primary baseline for CDIM; DPS's small per-step projection strategy fails under accelerated sampling.
- DDRM/DDNM: SVD-based methods are fast but achieve limited quality.
- DSG: Employs similar gradient-based updates but uses soft rather than exact constraints.
- Insight: The statistical properties of the forward process constitute a valuable source of guidance for the reverse process; replacing heuristic strategies with analytic distributions is a promising direction.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ Using the \(\chi^2\) distribution to guide projection is an elegant and original contribution.
- Experimental Thoroughness: ⭐⭐⭐⭐ Multi-task, multi-dataset comparisons with ablations and novel applications (temporal rephotography / point clouds), though quantitative evaluation could be further enriched.
- Writing Quality: ⭐⭐⭐⭐⭐ Mathematical derivations are rigorous and clear; figures are intuitive.
- Value: ⭐⭐⭐⭐⭐ Opens a new Pareto frontier in the speed–quality trade-off for diffusion-based inverse problem solving.