Skip to content

Continuous Semi-Implicit Models

Conference: ICML 2025
arXiv: 2506.06778
Code: None
Area: Image Generation
Keywords: Semi-Implicit Distributions, Diffusion Model Distillation, Continuous Time, Multi-step Generation, Variational Inference

TL;DR

Proposes CoSIM, which extends hierarchical semi-implicit models to a continuous-time framework. It achieves simulation-free, highly efficient training via continuous transition kernels, and designs consistency-preserving transition kernels to enable distribution-level multi-step diffusion model distillation, achieving or exceeding existing diffusion acceleration methods on ImageNet 512×512.

Background & Motivation

Background

Background: Semi-implicit (SI) distributions have demonstrated potential in variational inference and generative modeling. Hierarchical semi-implicit variational inference (HSIVI) enhances expressiveness by stacking multiple SI layers, which can be applied to accelerate diffusion models.

Limitations of Prior Work: Sequential training of HSIVI (layer-by-layer simulation) suffers from slow convergence. Existing diffusion distillation methods are either single-step deterministic (lacking diversity) or multi-step but complex to train.

Key Challenge: Multi-step generation requires flexible expressiveness but suffers from difficult training.

Goal: Efficiently train multi-step stochastic generative models.

Key Insight: Generalizing discrete hierarchical SI models to continuous time, where continuous transition kernels eliminate the need for sequential simulation.

Core Idea: Continuous-time SI models + consistency transition kernels = diffusion acceleration via distribution-level distillation.

Method

Overall Architecture

  1. Define a continuous-time transition kernel \(q_t(x_t | z)\), where \(z\) is implicit noise.
  2. Optimize divergence (the gap to the target distribution) during training.
  3. Design consistency-preserving transition kernels to achieve multi-step distillation.
  4. Guide training based on a pre-trained score network.

Key Designs

  1. Continuous-time Semi-implicit Model:

    • Function: Generalizes discrete hierarchical models to continuous time.
    • Mechanism: Defines continuous transition kernels using ODEs/SDEs, avoiding step-by-step sequential simulations.
    • Design Motivation: Continuous formulation allows simulation-free optimization during training.
  2. Consistency Transition Kernels:

    • Function: Designs transition kernels that guarantee multi-step generation is equivalent to single-step generation.
    • Mechanism: Ensures the marginal distribution of \(q_0(x | z)\) remains consistent regardless of the number of steps.
    • Design Motivation: Distribution-level consistency is key to distillation quality.

Loss & Training

  • Utilizes Fisher divergence from a pre-trained score network.
  • Simulation-free training.

Key Experimental Results

Main Results

ImageNet 512×512:

Method Steps FID ↓ FD-DINOv2 ↓
DDPM (Original) 250 2.1 -
Consistency Models 2 3.8 12.5
DMD2 1 3.2 10.8
CoSIM 4 2.5 8.9

Key Findings

  • High-quality generation is achieved in only 4 steps, establishing the best FD-DINOv2.
  • Stochastic multi-step generation preserves diversity better than deterministic single-step generation.
  • Continuous-time training converges 3-5× faster than discrete HSIVI.

Highlights & Insights

  • The generalization of semi-implicit distributions from discrete to continuous is natural and powerful.
  • The design of consistency transition kernels provides a new tool for multi-step distillation.
  • The superiority in the FD-DINOv2 metric suggests that the images generated by CoSIM possess higher semantic quality.

Limitations & Future Work

  • Relies on pre-trained score networks.
  • Theoretical analysis of the continuous-time formulation relies on strong assumptions.
  • Validated only on ImageNet.

Rating

  • Novelty: ⭐⭐⭐⭐ Continuous semi-implicit models possess theoretical depth.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Solid evaluation on ImageNet 512 with comprehensive comparisons.
  • Writing Quality: ⭐⭐⭐⭐ Clear theoretical explanations.
  • Value: ⭐⭐⭐⭐ Advances diffusion model acceleration.