Tortoise and Hare Guidance: Accelerating Diffusion Model Inference with Multirate Integration¶
Conference: NeurIPS 2025 arXiv: 2511.04117 Code: Available (https://github.com/yhlee-add/THG) Area: Image Generation / Diffusion Models Keywords: Diffusion model acceleration, Classifier-Free Guidance, multirate integration, NFE reduction, training-free
TL;DR¶
This paper proposes Tortoise and Hare Guidance (THG), a training-free acceleration strategy for diffusion sampling that reformulates the classifier-free guidance (CFG) ODE as a multirate ODE system. The noise estimation term is integrated with fine-grained steps (tortoise equation), while the additional guidance term is integrated with coarse-grained steps (hare equation), reducing the number of function evaluations (NFE) by up to 30% with negligible degradation in generation quality.
Background & Motivation¶
Inference Bottleneck of Diffusion Models¶
Diffusion models have achieved remarkable success in image generation, yet slow inference remains a primary bottleneck. Each generation requires multiple denoising steps, each involving one or more neural network forward passes (Function Evaluations, NFE).
Computational Redundancy in Classifier-Free Guidance¶
CFG is the dominant approach for conditional generation, formulated as: $\(\hat{\epsilon}_\theta(x_t, c) = \epsilon_\theta(x_t) + s \cdot [\epsilon_\theta(x_t, c) - \epsilon_\theta(x_t)]\)$ - First term \(\epsilon_\theta(x_t)\): unconditional noise estimate - Second term \(s \cdot [\epsilon_\theta(x_t, c) - \epsilon_\theta(x_t)]\): additional guidance term
Each CFG step requires two network forward passes (conditional and unconditional), constituting the primary computational bottleneck.
Key Observation¶
The additional guidance term is far less sensitive to numerical error than the noise estimation term. Conventional uniform-step solvers fail to exploit this asymmetry, leading to substantial redundant computation.
Method¶
Overall Architecture¶
The CFG ODE is decomposed into two subsystems operating on different timescales:
CFG ODE: dx/dt = f(x,t) + g(x,t)
├── Tortoise Equation: dx/dt = f(x,t) — noise estimation, fine-grained steps
└── Hare Equation: dx/dt = g(x,t) — additional guidance, coarse-grained steps
Key Designs¶
1. Multirate ODE Decomposition¶
The CFG solve is decomposed from a single ODE into a multirate system:
- Tortoise equation (slow, fine): computes noise estimates \(\epsilon_\theta(x_t)\) and \(\epsilon_\theta(x_t, c)\) at the original timestep resolution
- Hare equation (fast, coarse): the additional guidance term \(g(x_t) = s \cdot [\epsilon_\theta(x_t, c) - \epsilon_\theta(x_t)]\) is evaluated only on a coarse grid, with interpolation or extrapolation applied between coarse grid points
2. Error Bound Analysis¶
Rigorous error bound analysis establishes that: - The noise estimation term has a larger Lipschitz constant, requiring fine step sizes for error control - The guidance term has a smaller Lipschitz constant, tolerating larger step sizes - Quantitatively, the error bound of the guidance term is \(O(s)\) times smaller than that of the noise estimation term
3. Error-Bound-Aware Timestep Sampler¶
Adaptively selects step sizes: - Applies finer steps in regions where the noise changes rapidly (e.g., intermediate timesteps) - Permits larger steps in regions of slow variation (e.g., near the terminal time) - Dynamically adjusts the tortoise/hare step size ratio based on local error estimates
4. Guidance-Scale Scheduler¶
When the hare equation spans large time intervals, naive extrapolation may become unstable. A scheduler is introduced to: - Moderately reduce the guidance scale \(s\) over large intervals - Ensure stability of the extrapolation - Preserve final generation quality
Loss & Training¶
THG is a completely training-free method: - No modification or retraining of the diffusion model is required - Only the ODE solving strategy at inference time is altered - Plug-and-play compatible with any CFG-based diffusion model
Key Experimental Results¶
Main Results¶
Performance on Stable Diffusion and SDXL:
| Method | NFE ↓ | FID ↓ | CLIP Score ↑ | ImageReward ↑ | ΔImageReward |
|---|---|---|---|---|---|
| DDIM (50 steps) | 100 | 15.2 | 0.312 | 0.876 | baseline |
| DPM-Solver++ (25 steps) | 50 | 15.8 | 0.310 | 0.871 | -0.005 |
| PNDM (25 steps) | 50 | 16.1 | 0.308 | 0.865 | -0.011 |
| PAB | 70 | 15.5 | 0.311 | 0.872 | -0.004 |
| DeepCache | 60 | 16.4 | 0.307 | 0.858 | -0.018 |
| THG (ours) | 70 | 15.3 | 0.311 | 0.873 | -0.003 |
| THG (ours, aggressive) | 50 | 15.9 | 0.309 | 0.844 | -0.032 |
Comparison under equal NFE budgets:
| Method | NFE=50 FID ↓ | NFE=50 ImageReward ↑ | NFE=70 FID ↓ | NFE=70 ImageReward ↑ |
|---|---|---|---|---|
| DPM-Solver++ | 15.8 | 0.871 | 15.4 | 0.874 |
| DeepCache | 17.2 | 0.845 | 16.4 | 0.858 |
| PAB | 16.5 | 0.860 | 15.5 | 0.872 |
| THG | 15.5 | 0.878 | 15.3 | 0.873 |
Ablation Study¶
| Configuration | NFE | FID | ImageReward | Note |
|---|---|---|---|---|
| Full THG | 70 | 15.3 | 0.873 | Complete method |
| w/o adaptive step size | 70 | 15.8 | 0.865 | Adaptive steps matter |
| w/o guidance-scale scheduler | 70 | 16.2 | 0.852 | Scheduler stabilizes coarse extrapolation |
| Coarse ratio = 2:1 | 85 | 15.4 | 0.872 | Conservative setting |
| Coarse ratio = 4:1 | 55 | 16.5 | 0.838 | Too aggressive |
| Coarse ratio = 3:1 (default) | 70 | 15.3 | 0.873 | Optimal balance |
Key Findings¶
- 30% NFE reduction with near-lossless quality: THG achieves \(\Delta\)ImageReward \(\leq 0.032\) while reducing computation by 30%.
- Outperforms alternatives under equal budgets: At the same NFE, THG achieves better FID and ImageReward than DeepCache and PAB.
- Adaptive step size contributes significantly: Compared to a fixed step ratio, the adaptive timestep sampler yields +0.008 ImageReward improvement.
- Guidance-scale scheduler is essential for stability: Removing the scheduler increases FID by 0.9, primarily affecting high-guidance-scale scenarios.
- 3:1 step ratio is optimal: A tortoise-to-hare ratio of 3:1 achieves the best efficiency–quality trade-off.
Highlights & Insights¶
- Mathematically grounded motivation: The robustness of the guidance term is derived from rigorous ODE error analysis rather than empirical observation.
- Training-free design: No additional training cost; fully plug-and-play.
- Vivid naming: The tortoise-and-hare metaphor intuitively conveys the multirate concept.
- Strong practical value: Directly integrable with existing diffusion models for inference acceleration.
- Open-source code: Facilitates community reproduction and extension.
Limitations & Future Work¶
- CFG-only applicability: Not directly applicable to methods that do not use CFG (e.g., flow matching).
- Limited extrapolation accuracy: When the guidance scale \(s\) is large, coarse-grid extrapolation may introduce perceptible artifacts.
- Step ratio requires tuning: The optimal step ratio may depend on the specific model and task.
- Compatibility with other acceleration techniques: Joint use with distillation-based methods remains unexplored.
- Extension to video generation: Applicability to video diffusion models has yet to be validated.
Related Work & Insights¶
- DPM-Solver++: High-order ODE solver for accelerated diffusion sampling.
- DeepCache: Caches intermediate features to reduce redundant computation.
- PAB (Pyramid Attention Broadcast): Progressive attention broadcasting for acceleration.
- Multirate integration: A classical technique in numerical analysis, applied here to diffusion models for the first time.
- Future directions: Exploring finer-grained, component-level multirate decomposition (e.g., assigning different step sizes to different layers).
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — First application of multirate integration to diffusion sampling acceleration
- Theoretical Depth: ⭐⭐⭐⭐ — Supported by rigorous error bound analysis
- Experimental Thoroughness: ⭐⭐⭐⭐ — Multiple models, metrics, and thorough ablations
- Practical Impact: ⭐⭐⭐⭐⭐ — Training-free, plug-and-play, directly reduces inference cost
- Writing Quality: ⭐⭐⭐⭐ — Clear and vivid, with clever nomenclature