Rectified-CFG++ for Flow Based Models¶

Conference: NeurIPS 2025 arXiv: 2510.07631 Authors: Shreshth Saini, Shashank Gupta, Alan C. Bovik (UT Austin) Code: rectified-cfgpp.github.io Area: Image Generation Keywords: Classifier-Free Guidance, Rectified Flow, Text-to-Image Generation, Predictor-Corrector Sampling, Flow Models

TL;DR¶

To address the off-manifold drift caused by standard CFG in Rectified Flow models, this paper proposes Rectified-CFG++—an adaptive predictor-corrector guidance strategy that replaces extrapolative guidance with conditional flow prediction combined with time-scheduled interpolative correction. The method comprehensively outperforms standard CFG on large-scale models including Flux, SD3, SD3.5, and Lumina.

Background & Motivation¶

State of the Field¶

Classifier-Free Guidance (CFG) is the core technique for controlling conditional generation quality in diffusion models, enhancing text alignment by linearly extrapolating between conditional and unconditional velocity fields. However, Rectified Flow (RF) models employ deterministic ODE integration without stochastic regularization, and the extrapolative nature of CFG causes sampling trajectories to deviate from the learned data manifold, producing visual artifacts such as oversaturation, structural distortion, and text errors.

Limitations of Prior Work¶

Standard CFG: Directly applies extrapolative combination $\hat{v}_\omega = (1-\omega)v^u + \omega v^c$ ($\omega \geq 1$) in RF models, pushing trajectories off the manifold $\mathcal{M}_t$
CFG++: A manifold-constrained guidance method designed for diffusion models, not optimized for the geometric structure of RF
APG (Analytical Posterior Guidance): Partially mitigates artifacts but compromises on detail or geometric accuracy
CFG-Zero★: Provides limited improvement while remaining subject to the fundamental extrapolation problem
None of the above methods offer flow-model-specific theoretical guarantees or geometry-aware design

Root Cause¶

The geometric structure of RF models is naturally suited to interpolation rather than extrapolation. Designing a sampling strategy that incorporates guidance signals via interpolation—leveraging the deterministic transport paths of conditional flows—can achieve high-quality conditional generation while maintaining manifold consistency.

Method¶

Core Idea: Predictor-Corrector as a Replacement for Extrapolation¶

The standard CFG update is extrapolative: $x_{t-\Delta t} = x_t + \Delta t(v^u_t + \omega \Delta v^\theta_t)$, where $\Delta v^\theta_t = v^c_t - v^u_t$. This extrapolation, lacking stochastic noise regularization in a deterministic ODE, is prone to divergence.

Rectified-CFG++ replaces this with three steps:

Step 1: Conditional Flow Prediction (Predictor)¶

A half-step prediction is performed using the pure conditional velocity field $v^c_t$, advancing the sample along the conditional manifold:

\[\tilde{x}_{t-\Delta t/2} \leftarrow x_t + \frac{\Delta t}{2} v^c_t\]

The key insight is that using $v^c_t$ rather than $v^u_t$ or a CFG-mixed velocity anchors the trajectory to the target conditional subspace manifold from the outset, preventing early deviation.

Step 2: Guidance Difference Correction (Corrector via Guidance Difference)¶

At the predicted midpoint $\tilde{x}_{t-\Delta t/2}$, both conditional and unconditional velocity fields are evaluated:

\[v^c_{t-\Delta t/2} \leftarrow v_\theta(\tilde{x}_{t-\Delta t/2}, t-\Delta t/2, y)$$ $$v^u_{t-\Delta t/2} \leftarrow v_\theta(\tilde{x}_{t-\Delta t/2}, t-\Delta t/2, \varnothing)\]

Evaluating the guidance difference $\Delta v^\theta_{t-\Delta t/2}$ at the intermediate predicted point is more accurate than evaluating it at the current point $x_t$—particularly when the velocity field changes rapidly.

Step 3: Interpolative Update¶

The final effective velocity uses the conditional direction as an anchor, augmented by a time-scheduled guidance correction:

\[\hat{v}_{\lambda t} \leftarrow v^c_t + \alpha(t)(v^c_{t-\Delta t/2} - v^u_{t-\Delta t/2})\]

where the scheduling function is $\alpha(t) = \lambda_{\max}(1-t)^\gamma$, with $\lambda_{\max} > 0, \gamma \geq 0$. A standard ODE update is then performed using $\hat{v}_{\lambda t}$.

Theoretical Guarantees¶

Lemma 3.1 (Guidance Direction Stability): Under Lipschitz continuity assumptions, the difference between guidance differentials at the midpoint and at the current point is $O(\Delta t)$: $$\|\Delta v^\theta_{t-\Delta t/2} - \Delta v^\theta_t(x_t)\| \leq L V_{\max} \Delta t$$

Proposition 1 (Bounded Single-Step Perturbation): The single-step deviation of Rectified-CFG++ from the pure conditional flow is strictly bounded: $$\|\hat{x}_{t-1} - \tilde{x}_{t-1}\| \leq \alpha(t) B \Delta t$$

This guarantees that the trajectory remains within a bounded tubular neighborhood of the data manifold $\mathcal{M}_t$, where the neighborhood size is controlled by $\alpha(t)$ and the guidance field bound $B$.

Key Differences from CFG¶

Property	Standard CFG	Rectified-CFG++
Guidance mode	Extrapolation	Interpolation
Reference velocity	Unconditional $v^u_t$	Conditional $v^c_t$
Guidance evaluation point	Current point $x_t$	Intermediate predicted point $\tilde{x}_{t-\Delta t/2}$
Manifold preservation	No guarantee; prone to drift	Theoretically guaranteed bounded neighborhood
Additional network/training	No	No

Key Experimental Results¶

Experiment 1: MS-COCO 10K Multi-Model Comprehensive Evaluation¶

Rectified-CFG++ is comprehensively compared against standard CFG across four mainstream RF models:

Model	Guidance	FID↓	CLIP↑	Aesthetic↑	ImageReward↑	PickScore↑	HPSv2↑
Lumina	CFG	26.93	0.3511	5.8226	1.0924	0.5867	0.2797
Lumina	Rect-CFG++	22.49	0.3464	5.7755	0.9611	0.6133	0.3004
SD3	CFG	23.89	0.3439	5.5465	0.9812	0.4408	0.2751
SD3	Rect-CFG++	23.39	0.3471	5.6529	1.0009	0.5591	0.2897
SD3.5	CFG	20.29	0.3506	6.155	1.0487	0.4923	0.2933
SD3.5	Rect-CFG++	20.22	0.3497	6.1651	1.0796	0.5077	0.2946
Flux-dev	CFG	37.86	0.3351	4.721	1.0528	0.3248	0.2621
Flux-dev	Rect-CFG++	32.23	0.3493	5.3251	0.948	0.6752	0.2996

On Flux-dev, FID drops from 37.86 to 32.23 (a 14.9% reduction) and PickScore nearly doubles from 0.3248 to 0.6752, indicating that standard CFG produces particularly severe artifacts on Flux and that Rectified-CFG++ achieves the largest gains on this model.

Experiment 2: Guidance Strategy Comparison (MS-COCO 1K, SD3.5)¶

Guidance Method	FID↓	ImageReward↑	CLIP↑	HPSv2↑
No guidance	77.30	0.3852	0.3260	0.2421
CFG	67.71	1.0530	0.3515	0.2941
CFG-Zero★	68.39	0.9947	0.3458	0.2879
APG	67.23	1.0748	0.3513	0.2935
Rect-CFG++	67.15	1.0845	0.3506	0.2959

Rectified-CFG++ achieves the best results on FID, ImageReward, and HPSv2, with CLIP score only marginally below standard CFG.

T2I-CompBench Compositional Generation Evaluation¶

Model	Color↑	Shape↑	Texture↑	Spatial↑
Flux CFG	0.6132	0.4152	0.5928	0.2488
Flux Rect-CFG++	0.7728	0.5018	0.6705	0.2790
SD3 CFG	0.7658	0.5698	0.7270	0.3199
SD3 Rect-CFG++	0.8041	0.5778	0.7362	0.3306

The Color attribute on Flux improves from 0.6132 to 0.7728 (+26%), demonstrating that the color shift problem of CFG on Flux is effectively corrected.

Ablation Study: Component Contributions (MS-COCO 1K, SD3.5)¶

Configuration	FID↓	CLIP↑	HPSv2↑	Aesthetic↑
Prediction with unconditional velocity	91.12	0.1439	0.1870	6.1049
Without Predictor	73.70	0.3410	0.2969	6.1064
Without Corrector	74.65	0.3414	0.2975	6.1047
Full Rect-CFG++	72.97	0.3446	0.2995	6.1587

When the predictor uses the unconditional velocity, CLIP drops sharply to 0.14, confirming that the conditional prediction step is the core of the method.

Computational Efficiency¶

Under comparable runtime (SD3.5, 512×512), Rectified-CFG++ achieves FID 74.47 with 20 NFE, while standard CFG reaches only FID 85.82 with 28 NFE. Actual FLOPs are nearly identical, with a runtime difference of approximately 0.04 seconds.

Highlights & Insights¶

Principled and Elegant: The core intuition of replacing extrapolation with interpolation is clear; the predictor-corrector framework naturally decouples conditional anchoring from guidance correction without requiring additional networks or training
Theoretical Completeness: Rigorous mathematical proofs of manifold consistency and trajectory boundedness are provided, making this one of the few guidance methods that combines both theoretical and empirical rigor
Drop-in Replacement: Requires no training, no modification of model weights, and negligible additional computation; can directly replace CFG modules in existing RF models
Significant Improvement in Text Rendering: The method performs notably well on in-image text generation, a known weakness of diffusion models
Comprehensive Validation: Covers 4 large-scale models, multiple datasets, 6+ metrics, ablation studies, and a user study, reflecting rigorous experimental design

Limitations & Future Work¶

One Additional Forward Pass per Step: The prediction step requires an extra conditional velocity field evaluation; although the total number of steps can be reduced, single-step cost approximately doubles
Introduced Hyperparameters: The paper claims to be "parameter-free beyond guidance scale," yet the scheduling function $\alpha(t) = \lambda_{\max}(1-t)^\gamma$ introduces two additional hyperparameters
Text-to-Image Only: The method is not validated in other RF model applications such as video generation or 3D generation
Some Metrics Worse on Lumina: CLIP, Aesthetic, and ImageReward scores are lower on Lumina compared to standard CFG, indicating the method does not uniformly dominate across all settings
Area Misclassification: This paper addresses generative model sampling methodology and is unrelated to the object detection area

CFG (Ho & Salimans 2022): The original extrapolative guidance method; produces severe artifacts in RF models. Rectified-CFG++ fundamentally resolves this by replacing extrapolation with interpolation
CFG++ (Chung et al. 2024): A manifold-constrained method designed for diffusion SDEs that relies on stochastic regularization and is not applicable to deterministic RF
APG (Sadat et al. 2024): Analytical posterior guidance that partially alleviates artifacts at the cost of detail; Table 3 shows Rectified-CFG++ outperforms APG on FID, ImageReward, and HPSv2
CFG-Zero★: Reduces early drift by zeroing initial guidance but remains subject to extrapolation effects in later steps; overall performance is inferior to Rectified-CFG++
ReCFG: A related work using guidance differences; Rectified-CFG++ is distinguished by its predictor-corrector framework and midpoint evaluation strategy

Rating¶

Novelty: ⭐⭐⭐⭐ — The core idea of replacing extrapolation with interpolation is simple yet effective; the predictor-corrector framework is well motivated
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Four models, multiple datasets, 6+ metrics, ablation studies, and a user study; extremely comprehensive
Writing Quality: ⭐⭐⭐⭐ — Theoretical derivations are clear, figures are informative, and structure is complete
Value: ⭐⭐⭐⭐ — Strong practical utility as a drop-in replacement with direct contributions to the RF model ecosystem