Continuous Semi-Implicit Models¶
Conference: ICML 2025
arXiv: 2506.06778
Code: None
Area: Image Generation
Keywords: Semi-Implicit Distributions, Diffusion Model Distillation, Continuous Time, Multi-step Generation, Variational Inference
TL;DR¶
Proposes CoSIM, which extends hierarchical semi-implicit models to a continuous-time framework. It achieves simulation-free, highly efficient training via continuous transition kernels, and designs consistency-preserving transition kernels to enable distribution-level multi-step diffusion model distillation, achieving or exceeding existing diffusion acceleration methods on ImageNet 512×512.
Background & Motivation¶
Background¶
Background: Semi-implicit (SI) distributions have demonstrated potential in variational inference and generative modeling. Hierarchical semi-implicit variational inference (HSIVI) enhances expressiveness by stacking multiple SI layers, which can be applied to accelerate diffusion models.
Limitations of Prior Work: Sequential training of HSIVI (layer-by-layer simulation) suffers from slow convergence. Existing diffusion distillation methods are either single-step deterministic (lacking diversity) or multi-step but complex to train.
Key Challenge: Multi-step generation requires flexible expressiveness but suffers from difficult training.
Goal: Efficiently train multi-step stochastic generative models.
Key Insight: Generalizing discrete hierarchical SI models to continuous time, where continuous transition kernels eliminate the need for sequential simulation.
Core Idea: Continuous-time SI models + consistency transition kernels = diffusion acceleration via distribution-level distillation.
Method¶
Overall Architecture¶
- Define a continuous-time transition kernel \(q_t(x_t | z)\), where \(z\) is implicit noise.
- Optimize divergence (the gap to the target distribution) during training.
- Design consistency-preserving transition kernels to achieve multi-step distillation.
- Guide training based on a pre-trained score network.
Key Designs¶
-
Continuous-time Semi-implicit Model:
- Function: Generalizes discrete hierarchical models to continuous time.
- Mechanism: Defines continuous transition kernels using ODEs/SDEs, avoiding step-by-step sequential simulations.
- Design Motivation: Continuous formulation allows simulation-free optimization during training.
-
Consistency Transition Kernels:
- Function: Designs transition kernels that guarantee multi-step generation is equivalent to single-step generation.
- Mechanism: Ensures the marginal distribution of \(q_0(x | z)\) remains consistent regardless of the number of steps.
- Design Motivation: Distribution-level consistency is key to distillation quality.
Loss & Training¶
- Utilizes Fisher divergence from a pre-trained score network.
- Simulation-free training.
Key Experimental Results¶
Main Results¶
ImageNet 512×512:
| Method | Steps | FID ↓ | FD-DINOv2 ↓ |
|---|---|---|---|
| DDPM (Original) | 250 | 2.1 | - |
| Consistency Models | 2 | 3.8 | 12.5 |
| DMD2 | 1 | 3.2 | 10.8 |
| CoSIM | 4 | 2.5 | 8.9 |
Key Findings¶
- High-quality generation is achieved in only 4 steps, establishing the best FD-DINOv2.
- Stochastic multi-step generation preserves diversity better than deterministic single-step generation.
- Continuous-time training converges 3-5× faster than discrete HSIVI.
Highlights & Insights¶
- The generalization of semi-implicit distributions from discrete to continuous is natural and powerful.
- The design of consistency transition kernels provides a new tool for multi-step distillation.
- The superiority in the FD-DINOv2 metric suggests that the images generated by CoSIM possess higher semantic quality.
Limitations & Future Work¶
- Relies on pre-trained score networks.
- Theoretical analysis of the continuous-time formulation relies on strong assumptions.
- Validated only on ImageNet.
Rating¶
- Novelty: ⭐⭐⭐⭐ Continuous semi-implicit models possess theoretical depth.
- Experimental Thoroughness: ⭐⭐⭐⭐ Solid evaluation on ImageNet 512 with comprehensive comparisons.
- Writing Quality: ⭐⭐⭐⭐ Clear theoretical explanations.
- Value: ⭐⭐⭐⭐ Advances diffusion model acceleration.