RNE: plug-and-play diffusion inference-time control and energy-based training¶

Conference: ICLR 2026 arXiv: 2506.05668 Code: N/A Area: Image Generation / Diffusion Models Keywords: diffusion models, density ratio estimation, inference-time control, energy-based model training, Radon-Nikodym derivative

TL;DR¶

This paper proposes the Radon-Nikodym Estimator (RNE), which exploits density ratios between path distributions to reveal the fundamental relationship between marginal densities and transition kernels, providing a unified plug-and-play framework that simultaneously enables diffusion density estimation, inference-time control, and energy-based diffusion training.

Background & Motivation¶

Diffusion models generate data by iteratively denoising, corresponding to the time-reversal of a forward noising process. In many applications, access to denoising kernels alone is insufficient; knowledge of marginal densities along the generative trajectory is required. Such knowledge supports:

Density estimation: evaluating the probability density of a generative model at arbitrary points

Inference-time control: dynamically guiding outputs during generation, e.g., conditional generation and composition of multiple models

Energy-based diffusion training: training energy functions to parameterize diffusion models

However, obtaining marginal densities of diffusion models has long been a challenging problem: - Direct computation requires integrating over all possible forward paths, which is computationally intractable - Existing methods (e.g., likelihood estimation via the probability-flow ODE) are either computationally expensive or insufficiently accurate - Inference-time control methods typically rely on specific assumptions (e.g., approximations via the Tweedie formula), limiting their applicability

Core Insight: By invoking the concept of the Radon-Nikodym derivative (density ratio), a fundamental mathematical relationship between marginal densities and transition kernels can be established—without training additional models and without depending on any specific diffusion model architecture.

Method¶

Overall Architecture¶

The core idea of RNE is to operate at the level of distributions over generative paths:

Define two path distributions: the forward (noising) path distribution \(\mathbb{P}\) and the backward (denoising) path distribution \(\mathbb{Q}\)
By the Radon-Nikodym theorem, the density ratio \(\frac{d\mathbb{Q}}{d\mathbb{P}}\) of \(\mathbb{Q}\) with respect to \(\mathbb{P}\) can be expressed in terms of transition kernels
By manipulating this density ratio, marginal densities can be estimated and controlled

Key Designs¶

Path distribution density ratio: the central mathematical tool
- Consider the forward process \(q(x_0, x_1, ..., x_T)\) and the backward process \(p(x_T, x_{T-1}, ..., x_0)\)
- The Radon-Nikodym derivative links the two path distributions: \(\frac{d\mathbb{Q}}{d\mathbb{P}}(x_{0:T}) = \frac{p(x_T) \prod_{t=1}^{T} p_\theta(x_{t-1}|x_t)}{q(x_0) \prod_{t=1}^{T} q(x_t|x_{t-1})}\)
- This ratio decomposes into a product of stepwise local ratios, each involving only known transition kernels
- Design Motivation: reformulate the global density problem as local (stepwise) density ratio estimation
Diffusion density estimation: obtaining marginal densities via density ratios
- At any intermediate timestep \(t\), the marginal density of state \(x_t\) can be estimated using partial products of path density ratios
- No additional density model needs to be trained; the transition kernels of the diffusion model itself are directly exploited
- The density ratio in path space can be estimated via Monte Carlo sampling
Inference-time control: plug-and-play conditional generation
- Annealing: the estimated marginal density is used as an annealing temperature schedule for the energy function, enabling more accurate conditional sampling
- Model Composition: multiplying density ratios from multiple diffusion models achieves multi-condition compositional control
- RNE operates as a plug-and-play module without modifying pretrained diffusion model weights
- Supports inference-time scaling: larger computational budgets translate directly into improved control quality
Energy-based diffusion training: RNE as a training regularizer
- Training traditional energy-based diffusion models requires estimating the partition function, which is computationally difficult
- RNE provides a straightforward regularization approach: leveraging density ratios to constrain energy function training
- This avoids explicit estimation of the partition function and simplifies the training pipeline
Modality agnosticism: not restricted to continuous diffusion
- The theoretical framework of RNE is grounded in the general notion of path distributions
- It applies not only to continuous-state diffusion models but also to discrete diffusion models (e.g., discrete denoising for text generation)
- This makes RNE a modality-agnostic general-purpose tool

Loss & Training¶

RNE requires no additional training in the inference-time control setting (plug-and-play), and serves as an auxiliary regularization loss in the energy-based diffusion training setting:

Inference-time control: the pretrained model is frozen; sampling trajectories are adjusted solely via density ratios
Energy training regularization: an RNE-based regularization term is added to the standard denoising loss to enforce consistency between the learned energy function and the true density ratio

Key Experimental Results¶

Main Results¶

Task	Method	Key Metric	Notes
Annealed sampling	RNE	Outperforms standard methods	More accurate conditional sampling
Model composition	RNE	High-quality multi-condition generation	Combines multiple pretrained models
Inference-time scaling	RNE	Performance improves with compute	Validates scaling property
Energy-based diffusion training	RNE regularization	Simple and effective	No partition function estimation needed

Ablation Study¶

Configuration	Key Metric	Notes
Without RNE density estimation	Inaccurate density estimation	Lacks path-level density ratio information
With RNE	Improved density estimation accuracy	Fully exploits transition kernel information
Continuous diffusion	Verified effective	Standard setting
Discrete diffusion	Equally effective	Validates modality agnosticism

Key Findings¶

Unified framework for inference-time control: RNE unifies seemingly disparate inference-time control methods—such as annealing and model composition—under a density-ratio perspective
Inference-time scaling: allocating more computation (additional sampled paths) consistently improves control accuracy, consistent with the trend of inference-time compute scaling
Simplified energy training: RNE regularization circumvents the difficulty of partition function estimation in conventional energy-based model training
Cross-modal generality: the effectiveness of RNE is verified on both continuous and discrete diffusion models

Highlights & Insights¶

Theoretical elegance: by employing the Radon-Nikodym derivative—a fundamental tool from measure theory—the paper establishes a unified connection among three seemingly independent problems in diffusion models: density estimation, inference-time control, and energy-based training
Plug-and-play design: requires neither modification of pretrained models nor training of additional control networks (e.g., ControlNet), substantially lowering the barrier to adoption
Innovation of the path-distribution perspective: rather than operating at the level of single-step transitions, the framework establishes connections at the level of distributions over complete trajectories—a higher level of abstraction
Inference-time scaling property: resonates with the broader AI community's interest in test-time compute and inference-time scaling
Applicability to discrete diffusion: extends the framework's scope, with potential value for diffusion-based generation of discrete sequences such as text and proteins

Limitations & Future Work¶

Variance of Monte Carlo estimation: density ratio estimation in path space may exhibit high variance, particularly along long diffusion trajectories
Computational cost: although no additional training is required, inference-time estimation of density ratios demands multiple sampled paths, increasing inference latency
Insufficient validation at large scale: validation on large-scale models such as Stable Diffusion and DALL-E is needed
Lack of systematic comparison with existing inference-time control methods: detailed comparisons with Classifier Guidance, Classifier-Free Guidance, DPS, and related methods are warranted
Gap between theory and practice: the theoretical framework assumes exact forward/backward kernels, whereas learned approximate models are used in practice; the impact of approximation error requires deeper analysis

Density estimation for diffusion models: compared to the continuous normalizing flow (CNF) approach of Song et al., RNE does not require solving an ODE but instead operates directly at the level of path distributions
Inference-time control: complementary to Classifier Guidance (Dhariwal & Nichol, 2021), DPS (Chung et al., 2022), FreeDoM (Yu et al., 2023), and related methods, while providing a more unified theoretical perspective
Energy-based models: complements energy-based diffusion training methods by simplifying the partition function estimation problem
Insights: RNE demonstrates the power of reasoning about generative models at the distributional rather than pointwise level, a perspective that may inspire further unified frameworks

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — Unifying three independent problems via the Radon-Nikodym derivative represents a strongly original theoretical contribution
Experimental Thoroughness: ⭐⭐⭐ — Proof-of-concept experiments are sufficient, but large-scale validation is lacking
Writing Quality: ⭐⭐⭐⭐ — Theory is clearly presented with a coherent unified framework
Value: ⭐⭐⭐⭐ — The plug-and-play nature and theoretical unification carry significant practical and academic value