DistillKac: Few-Step Image Generation via Damped Wave Equations¶
Conference: ICLR 2026 arXiv: 2509.21513 Code: None Area: Diffusion Models / Few-Step Generation / Novel PDE Framework Keywords: damped wave equation, Kac process, finite-speed flow, endpoint distillation, few-step generation
TL;DR¶
This paper replaces the Fokker-Planck equation with the damped wave equation (telegrapher's equation) and its stochastic Kac representation as the probabilistic flow foundation for generative models, enabling finite-speed propagation. An endpoint distillation method is proposed for few-step generation, achieving FID=4.14 in 4 steps and FID=5.66 in 1 step on CIFAR-10.
Background & Motivation¶
Background: Diffusion models are grounded in the Fokker-Planck equation (a parabolic PDE), whose reverse velocity field becomes stiff near the terminal time, as the diffusion process permits infinite propagation speed.
Limitations of Prior Work: The velocity norm of the reverse ODE can grow unboundedly as \(t \to T\), causing numerical instability near the endpoint and requiring a large number of steps to guarantee accuracy. During distillation, student models tend to deviate from teacher trajectories under large step sizes.
Key Challenge: Infinite propagation speed → stiff velocity field → unstable sampling → many steps required. Can this problem be addressed at the PDE level?
Goal: Introduce a hyperbolic PDE (damped wave equation) as an alternative, exploiting its finite-speed propagation to obtain more stable few-step generation.
Key Insight: The damped wave equation generalizes the Fokker-Planck equation — diffusion emerges as the limit when both damping and speed tend to infinity. The Kac process naturally imposes a velocity upper bound \(c\), guaranteeing globally bounded kinetic energy and Lipschitz regularity in Wasserstein space.
Core Idea: Finite-speed probabilistic flow ensures that endpoint matching automatically guarantees proximity along the entire path, thereby stabilizing few-step distillation.
Method¶
Overall Architecture¶
- PDE Replacement: Replaces the Fokker-Planck equation with the damped wave equation \(\partial_{tt} p + \lambda \partial_t p = c^2 \nabla^2 p\)
- Stochastic Kac Representation: Particles move at finite speed \(c\), with velocity direction reversed (1D) or resampled (high-dimensional) according to a Poisson process
- Guided Kac Flow: Incorporates CFG in the velocity space while preserving square-integrability
- Endpoint Distillation: The student matches the teacher's output at the endpoints of each time interval
Key Designs¶
-
Finite-Speed Probabilistic Flow:
- Function: Replaces the diffusion SDE with the Kac process, yielding a globally bounded velocity field
- Mechanism: In the Kac process, particle position \(X_t\) and velocity \(V_t\) evolve jointly, with \(|V_t| \leq c\), so trajectories remain within the causal cone and cannot propagate infinitely
- Design Motivation: Bounded velocity → no terminal stiffness → more stable numerical integration → more robust few-step sampling
-
Endpoint Distillation + Path Stability Theorem (Theorem 8):
- Function: Proves that endpoint matching guarantees proximity along the entire trajectory
- Mechanism: By exploiting the Lipschitz regularity of the Kac flow, if the student and teacher agree at endpoints \(t_k\), they remain close throughout the interval \([t_{k+1}, t_k]\), with errors decaying at \(O(M^{-1})\) (Euler student)
- Design Motivation: This is an advantage unique to finite-speed flows — infinite-speed diffusion flows cannot guarantee such stability
-
Velocity-Space CFG:
- Function: Applies classifier-free guidance to the Kac velocity field
- Mechanism: \(u_{\text{guided}} = (1+w) u_\theta^{\text{cond}} - w u_\theta^{\text{uncond}}\), proved to preserve square-integrability under mild conditions
- Design Motivation: Conventional CFG operates in score space and may violate the finite-speed constraint; operating in velocity space naturally preserves it
Loss & Training¶
- UNet backbone; evaluated on CIFAR-10, CelebA-64, and LSUN Bedroom-256
- Teacher: 100-step Guided Kac Flow (AB-2 integrator)
- Distillation: iterative schedule 100→20→4→2→1 steps
- Endpoint MSE loss
Key Experimental Results¶
Main Results¶
| Method | NFE | FID (CIFAR-10) | FID (CelebA-64) |
|---|---|---|---|
| Guided Kac Flow (100 steps, AB-2) | 100 | 3.58 | 3.50 |
| DistillKac | 20 | 3.72 | 3.42 |
| DistillKac | 4 | 4.14 | 4.36 |
| DistillKac | 2 | 4.68 | 5.66 |
| DistillKac | 1 | 5.66 | 7.45 |
| DDIM (100 steps) | 100 | 4.16 | 6.53 |
| DDIM (20 steps) | 20 | 6.84 | 13.73 |
| Progressive Distillation | 4 | 3.00 | — |
| iCT | 2 | 2.46 | — |
Key Findings¶
- Distilling from 100 to 1 step increases FID by only 2.08 (3.58→5.66), demonstrating the endpoint stability advantage of finite-speed flow
- At 20 steps, DistillKac (3.72) substantially outperforms DDIM (6.84); the gap widens further at 4 steps (4.14 vs. unavailable)
- The AB-2 integrator achieves the best efficiency: second-order accuracy with only one function evaluation per step
- Absolute FID values remain below EDM (1.79) and iCT (2.46), indicating that the base Kac flow model's fitting capacity requires further improvement
Highlights & Insights¶
- PDE-Level Innovation: Extending generative models from parabolic PDEs (Fokker-Planck) to hyperbolic PDEs (damped wave equation) represents a fundamental paradigm shift. The taxonomy in Table 1 — mapping three PDE classes (parabolic/elliptic/hyperbolic) to three families of generative models — is highly illuminating.
- The endpoint-to-path stability theorem is the core theoretical contribution: the geometric properties of finite-speed flows allow endpoint supervision to yield path consistency "for free," providing the theoretical foundation for the distillation design.
- Potential impact: If the base Kac flow model quality can be further improved (e.g., via DiT), the stability advantages of finite-speed propagation may become even more pronounced at scale.
Limitations & Future Work¶
- Absolute generation quality falls short of SOTA (FID 3.58 vs. EDM 1.79); the base Kac flow model requires improvement
- Validation is limited to small-scale datasets (CIFAR-10, CelebA-64); experiments on ImageNet or high-resolution benchmarks are absent
- The speed bound \(c\) and damping rate \(\lambda\) of the Kac process require tuning, increasing the hyperparameter burden
- The efficiency of the directional resampling mechanism for Kac processes in high dimensions is not thoroughly analyzed
- Comparison with consistency models (iCT, sCT) is insufficiently comprehensive
Related Work & Insights¶
- vs. Kac Flow (Duong et al., 2026): DistillKac builds on this work by adding CFG and distillation, reducing FID from 6.42 to 3.58 (100 steps) and 5.66 (1 step)
- vs. Progressive Distillation: Conceptually similar, but with a distinct theoretical foundation — DistillKac provides formal guarantees via the endpoint-to-path stability theorem
- vs. Flow Matching / Rectified Flow: Both are ODE-based flows, but the Kac process imposes a finite-speed constraint that may yield greater stability near the terminal time
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ Pioneering hyperbolic PDE framework for generative models; outstanding theoretical contribution
- Experimental Thoroughness: ⭐⭐⭐ Limited to small datasets; absolute performance is not sufficiently strong
- Writing Quality: ⭐⭐⭐⭐⭐ Rigorous and elegant theoretical derivations; the mapping from PDEs to generative models is clearly presented
- Value: ⭐⭐⭐⭐ Opens a new direction (hyperbolic generative models), but further work is needed to validate large-scale feasibility