Modulated Diffusion: Accelerating Generative Modeling with Modulated Quantization¶

Conference: ICML 2025
arXiv: 2506.22463
Code: https://github.com/WeizhiGao/MoDiff
Area: Image Generation
Keywords: Diffusion model acceleration, quantization, caching, error compensation, post-training quantization

TL;DR¶

MoDiff proposes a framework combining modulated quantization and error compensation to accelerate diffusion models. It reduces activation quantization from 8-bit to 3-bit without performance loss, while inheriting the dual advantages of both caching and quantization methods.

Background & Motivation¶

Background¶

Background: Diffusion models exhibit outstanding performance in generative tasks, but iterative sampling incurs massive computational overhead. Acceleration techniques primarily include step reduction (distillation), caching (reusing intermediate computations), and quantization (low-precision computation).

Limitations of Prior Work: Although caching methods avoid redundant computation, the approximation errors they introduce accumulate over time steps. Quantization methods suffer from significant degradation in generation quality at low bitwidths (<8 bits). Combining the two approaches is highly prone to overlapping errors.

Key Challenge: The fundamental trade-off between aggressive acceleration (lower bitwidths/more caching) and generative quality. How can quantization and caching/approximation errors be mathematically understood and systematically controlled?

Goal: To provide a unified framework that deeply analyzes the error sources of caching and quantization, and designs effective error compensation mechanisms.

Key Insight: Starting from theoretical analysis, this work reveals the intrinsic connection between quantization and caching, designing "modulated quantization" to dynamically adjust quantization parameters to compensate for errors.

Core Idea: Actively compensate for quantization errors during the diffusion process by modulating quantization parameters, maintaining generative quality even at extremely low bitwidths.

Method¶

Overall Architecture¶

Input: Pre-trained diffusion model + calibration data
Process: (1) Analyze the activation distribution characteristics at each time step; (2) design modulation parameters for each time step; (3) dynamically adjust the quantization range and step size using modulation parameters during quantization.
Output: Quantized diffusion model ready for low-bit inference.
The entire process is Post-Training Quantization (PTQ), requiring no retraining.

Key Designs¶

Unified Analysis of Caching and Quantization:
- Theoretically proves that caching methods are essentially a special form of quantization—replacing the current step's value with the previous step is equivalent to infinitely coarse quantization.
- Reveals the shared error source of both methods: the variation of activations across time steps.
- Design Motivation: A unified perspective enables the design of a framework that simultaneously addresses both types of errors.
Modulated Quantization:
- Learns a set of modulation parameters \(\gamma_t, \beta_t\) for each time step \(t\).
- Applies a modulation transform to the activation: \(\hat{x} = \gamma_t \cdot Q(x) + \beta_t\).
- Modulation parameters are learned by minimizing the gap between the quantized output and the full-precision output.
- Design Motivation: Activation distributions vary significantly across different time steps (due to differing noise scales), making fixed quantization parameters unsuitable.
Error Compensation Mechanism:
- Tracks the residual errors introduced by quantization and compensates for them in subsequent steps.
- Theoretically derives the upper bound of the error to ensure that the total error after compensation remains controllable.
- Supports integration with caching methods.
- Design Motivation: Quantization errors accumulate during iterative sampling, necessitating explicit error control.

Loss & Training¶

Modulation parameters are learned by minimizing the MSE loss: \(\min_{\gamma, \beta} \|f(x) - (\gamma \cdot Q(x) + \beta)\|^2\)
Requires only a small amount of calibration data (forward propagation of a few hundred images) without full retraining.
Supports both layer-wise and step-wise parameter optimization.

Key Experimental Results¶

Main Results¶

Dataset	Model	Bitwidth	FID↓	Gap to Full Precision
CIFAR-10	DDPM	8-bit	Virtually lossless	<0.5
CIFAR-10	DDPM	4-bit	Slight degradation	~1-2
CIFAR-10	DDPM	3-bit	No significant degradation	<1
LSUN	LDM	8-bit	Virtually lossless	<0.5
LSUN	LDM	3-bit	No significant degradation	<1

Ablation Study¶

Configuration	FID	Description
Standard PTQ (3-bit)	Significant degradation	Direct quantization without modulation
MoDiff (3-bit)	Virtually lossless	Modulated quantization is effective
Caching only	Slight degradation	Approximation error of caching
MoDiff + Caching	Optimal	Complementary to each other
Without error compensation	Moderate	Compensation is crucial for low bitwidths

Key Findings¶

Lossless 3-bit Quantization: On CIFAR-10 and LSUN, 3-bit is sufficient to maintain full-precision performance.
Criticality of Modulation Parameters: Different time steps require different quantization strategies.
Error compensation is vital for scenarios with <4-bit.
MoDiff, as a general framework, can accelerate all diffusion models.

Highlights & Insights¶

Theoretical Innovation: A unified error analysis framework for caching and quantization.
Extreme Compression: 3-bit activation quantization (theoretically 8x speedup vs. FP32) without performance loss.
Versatility: As a PTQ method, it is plug-and-play and requires no modification to the pre-trained model.
Theoretical Guarantees: The derivation of the error upper bound provides reliability guarantees.

Limitations & Future Work¶

Weight quantization is not fully explored (this paper focuses on activation quantization).
Actual hardware speedup in deployment requires support for low-bit operations.
Insufficient validation on larger models (such as SDXL, DiT-XL).
The storage overhead of modulation parameters requires a trade-off.

Q-Diffusion, PTQD, etc., are primary baselines for diffusion model quantization.
Caching methods like DeepCache provide complementary acceleration strategies.
Inspiration: The idea of dynamic quantization parameters may be applicable to other iterative inference models (such as ARMs, MCTS).

Rating¶

Novelty: ⭐⭐⭐⭐ The theoretical framework of modulated quantization is innovative.
Experimental Thoroughness: ⭐⭐⭐⭐ Solid validation on CIFAR-10 and LSUN.
Writing Quality: ⭐⭐⭐⭐ Clear theoretical derivations.
Value: ⭐⭐⭐⭐ Highly practical value for diffusion model deployment.