Elucidating the SNR-t Bias of Diffusion Probabilistic Models¶

Conference: CVPR 2026
arXiv: 2604.16044
Code: https://github.com/AMAP-ML/DCW
Area: Image Generation
Keywords: Diffusion Models, SNR-t Bias, Differential Correction, Wavelet Domain, Training-free

TL;DR¶

This paper reveals the pervasive SNR-t bias in diffusion models (the mismatch between the Signal-to-Noise Ratio of samples in the reverse process and their timestamps) and proposes Differential Correction in Wavelet domain (DCW). DCW is a training-free, plug-and-play method that enhances the generation quality across various diffusion models.

Background & Motivation¶

Background: Diffusion Probabilistic Models (DPMs) have achieved great success in generation tasks such as image, audio, and video. During training, models strictly bind noisy samples to timestamps: \(X_t = \sqrt{\bar{\alpha}_t} X_0 + \sqrt{1-\bar{\alpha}_t} \epsilon_t\), where the signal-to-noise ratio \(\text{SNR}(t) = \bar{\alpha}_t / (1-\bar{\alpha}_t)\) is entirely determined by the timestep \(t\).

Limitations of Prior Work: During inference, due to the accumulation of network prediction errors and numerical solver discretization errors, reverse denoising trajectories inevitably deviate from the ideal path. This leads to a mismatch between the actual SNR of the predicted sample \(\hat{x}_t\) and the SNR corresponding to the preset timestep \(t\)—defined here as SNR-t bias.

Key Challenge: While SNR and timestamps are strictly coupled during training, this correspondence is broken during inference. When the network receives samples with mismatched SNRs, it produces significant prediction bias: samples with lower SNR lead to over-prediction of noise, while samples with higher SNR lead to under-prediction. Experiments confirm that reverse process samples consistently exhibit lower SNRs than forward process samples.

Goal: (1) Provide a systematic empirical and theoretical proof of SNR-t bias; (2) Design a training-free correction method to mitigate this bias.

Key Insight: Utilizing the reconstructed sample \(x^0_\theta\) generated at each step of the reverse denoising process. The differential signal between \(x^0_\theta\) and the predicted sample \(\hat{x}_{t-1}\) contains gradient information to push shifted samples toward the ideal trajectory.

Core Idea: Introducing differential correction into the wavelet domain to correct different frequency components separately, using dynamic weights designed based on the "low-frequency first, high-frequency later" denoising characteristic of diffusion models.

Method¶

Overall Architecture¶

DCW is embedded into each denoising step as a plug-and-play inference module. After each denoising step is completed: (1) Reconstructed sample \(x^0_\theta\) and predicted sample \(x_{t-1}\) are mapped to the wavelet domain via DWT; (2) Differential signals are calculated for low-frequency (LL) and high-frequency (LH, HL, HH) components, applying dynamic weighted correction; (3) The result is mapped back to pixel space via iDWT.

Key Designs¶

Theoretical Proof of SNR-t Bias:
- Function: Provide a rigorous mathematical foundation for the bias phenomenon.
- Mechanism: Assuming the reconstruction model \(x^0_\theta(\hat{x}_t, t) = \gamma_t x_0 + \phi_t \epsilon_t\) (\(0 < \gamma_t \leq 1\)), the actual SNR of reverse process samples is derived as \(\text{SNR}(t) = \hat{\gamma}_t^2 \bar{\alpha}_t / (1-\bar{\alpha}_t + (\frac{\sqrt{\bar{\alpha}_t}\beta_{t+1}}{1-\bar{\alpha}_{t+1}}\phi_{t+1})^2)\). Since \(\hat{\gamma}_t \leq 1\) and the denominator contains an extra positive term, the reverse SNR is always lower than the forward SNR—theoretically proving the inevitability of the bias.
- Design Motivation: The prior assumption \(x^0_\theta = x_0 + \phi_t \epsilon_t\) contradicts the Tweedie formula and variance identity (it would lead to \(\mathbb{E}[\|x^0_\theta\|^2] > \mathbb{E}[\|x_0\|^2]\)), hence the more accurate \(\gamma_t < 1\) assumption.
Pixel-space Differential Correction:
- Function: Use intrinsic information from the denoising process to guide shifted samples back to the ideal trajectory.
- Mechanism: The differential signal \(\hat{x}_{t-1} - x^0_\theta(\hat{x}_t, t)\) contains directional information pointing toward the ideal \(x_{t-1}\). The correction formula is \(\hat{x}_{t-1} = \hat{x}_{t-1} + \lambda_t (\hat{x}_{t-1} - x^0_\theta(\hat{x}_t, t))\), where \(\lambda_t\) is the guidance coefficient. Intuitively, the differential signal pushes the predicted sample toward the noise direction (increasing SNR), mitigating the low SNR issue in the reverse process.
- Design Motivation: The differential signal is a byproduct of the denoising process, requiring no extra computation. Correcting \(\hat{x}_{t-1}\) is more efficient and effective than correcting \(\hat{x}_t\).
Wavelet Domain Differential Correction (DCW):
- Function: Apply differentiated correction to different frequency components.
- Mechanism: DWT decomposes samples into four frequency components: LL, LH, HL, and HH. Differential correction is applied independently to each component: \(\hat{x}^f_{t-1} = \hat{x}^f_{t-1} + \lambda^f_t (\hat{x}^f_{t-1} - x^{0,f}_\theta)\), where \(f \in \{ll, lh, hl, hh\}\). The dynamic weights \(\lambda^f_t\) adapt based on the denoising stage—prioritizing low-frequency correction (global structure) early on, and high-frequency correction (textures) later.
- Design Motivation: The characteristic where diffusion models reconstruct low frequencies before high frequencies means different stages require correction for different components. Uniform pixel-space correction cannot distinguish these frequency-specific needs.

Loss & Training¶

Entirely training-free. DCW is embedded in the inference process as a plug-and-play module without modifying model weights. Computational overhead is negligible, consisting only of DWT/iDWT and differential operations.

Key Experimental Results¶

Main Results¶

Base Model	Original FID	+ DCW FID	Gain
IDDPM	8.45	6.72	-1.73
ADM	4.59	3.97	-0.62
DDIM (50 steps)	8.72	7.31	-1.41
EDM	1.97	1.79	-0.18
FLUX	Improved	Improved	Significant

Ablation Study¶

Configuration	FID Improvement
Pixel-space Correction (DC)	Moderate
Wavelet Domain Correction (DCW)	Best
Low-frequency only	Partial
High-frequency only	Partial
Dynamic vs Fixed weights	Dynamic is better

Key Findings¶

DCW is effective across 8 different diffusion models (IDDPM, ADM, DDIM, A-DPM, EA-DPM, EDM, PFGM++, FLUX), proving the universality of SNR-t bias.
It can be stacked with exposure bias correction models for additional gains, suggesting SNR-t bias is more fundamental than exposure bias.
Consistently effective across datasets of different resolutions (CIFAR-10, ImageNet, etc.).
Wavelet domain correction outperforms pixel-space correction, validating the necessity of frequency-separated correction.
Computational overhead is negligible.

Highlights & Insights¶

Reveals a fundamental problem: SNR-t bias is an inherent issue for all DPMs and is more fundamental than exposure bias. The theoretical derivation of \(\gamma_t < 1\) elegantly explains its inevitability.
Clever utilization of differential signals: The byproduct of the denoising process naturally contains correction direction information, requiring no additional networks or searching.
Plug-and-play practicality: Zero training cost and near-zero inference overhead; it can be directly applied to any DPM, including cutting-edge models like FLUX.

Limitations & Future Work¶

Currently, \(\lambda_t\) needs to be adjusted for different models; automated setting strategies remain to be researched.
Theoretical analysis is based on Gaussian assumptions; there may be a gap regarding actual data distribution shifts.
Effectiveness in consistency models with extremely few steps (1-2 steps) needs further verification.
Combination strategies with other improvement methods could be further explored.

vs ADM-ES / TS-DPM: These works study exposure bias (differences between samples). SNR-t bias focuses on the mismatch between samples and timestamps, which is a lower-level problem. DCW can be used alongside them.
vs ADM-IP: ADM-IP mitigates bias by re-perturbing training data, which requires retraining. DCW is training-free and plug-and-play.
vs FreeU: FreeU reweights frequency components within the U-Net. DCW dynamically corrects frequency components during the denoising process. They operate at different levels.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First to systematically reveal and prove SNR-t bias with deep theoretical analysis.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Validated across 8 models, multiple resolutions, and combined with other methods.
Writing Quality: ⭐⭐⭐⭐⭐ Complete and elegant logical chain from phenomenon to theory to method.
Value: ⭐⭐⭐⭐⭐ Wide impact by revealing a fundamental issue and providing a practical solution.