ICLR 2026 Image Generation replica exchange parallel tempering inference-time control SMC alternative reward tilting CFG debiasing

CREPE: Controlling Diffusion with Replica Exchange¶

Conference: ICLR 2026 arXiv: 2509.23265 Code: Available (GitHub) Area: Diffusion Models / Inference-Time Control Keywords: replica exchange, parallel tempering, inference-time control, SMC alternative, reward tilting, CFG debiasing

TL;DR¶

This paper proposes CREPE, an inference-time control method for diffusion models based on Replica Exchange (Parallel Tempering), serving as the computational dual of SMC — it operates in parallel across denoising steps while generating samples serially. CREPE offers high sample diversity, supports online refinement, and handles a variety of tasks including temperature annealing, reward tilting, model composition, and CFG debiasing.

Background & Motivation¶

Background: Inference-time control of diffusion models — satisfying new constraints without retraining — is an active research direction. The dominant approach is Sequential Monte Carlo (SMC), which corrects biases introduced by heuristic guidance by maintaining a set of weighted particles along the denoising trajectory.

Limitations of Prior Work: SMC has three key limitations: (a) it requires maintaining a large number of particles simultaneously throughout the entire denoising trajectory, incurring high memory overhead; (b) sample diversity is poor, particularly when the particle count is small, as resampling leads to particle collapse; (c) samples cannot be refined after generation — if results are unsatisfactory or new constraints are introduced, generation must restart from scratch.

Key Challenge: The "parallel particles × serial timesteps" paradigm of SMC inherently creates bottlenecks in diversity and flexibility, motivating a computationally dual alternative.

Goal: To propose an alternative to SMC that (a) generates particles one at a time rather than in batches, (b) maintains high diversity after burn-in, (c) supports online refinement and early stopping, and (d) covers diverse tasks including tempering, reward tilting, model composition, and CFG debiasing.

Key Insight: Replica Exchange / Parallel Tempering is precisely the computational dual of SMC — it runs chains in parallel across different denoising steps while generating samples serially. This MCMC framework is adapted to the diffusion model setting.

Core Idea: Adapt the swap moves of Parallel Tempering to the path space of diffusion models, leveraging the Radon-Nikodym Estimator to compute acceptance probabilities, thereby enabling inference-time control without access to explicit target densities.

Method¶

Overall Architecture¶

CREPE maintains \(M+1\) particles, each residing at a distinct diffusion timestep \(t_0 < t_1 < \cdots < t_M\) (spanning from the data distribution to pure noise). Each iteration consists of: 1. Communication step: Adjacent particles exchange positions via APT swap moves — forward and backward proposal paths are generated, and acceptance probabilities determine whether swaps occur. 2. Local exploration step: Each particle performs a local MCMC update at its respective timestep. 3. Both steps can be parallelized.

Key Designs¶

Accelerated PT Swap Move in Diffusion Path Space:
- Function: Enables particles \((x, x')\) residing at timesteps \(t\) and \(t'\) to exchange positions via forward/backward diffusion paths.
- Mechanism: Starting from \(x\), the forward diffusion process is applied to reach \(t'\); starting from \(x'\), the reverse process is applied to reach \(t\). A Metropolis-Hastings acceptance probability \(\alpha_{t,t'}\) determines whether the swap is accepted. This probability is computed via the Radon-Nikodym Estimator (RNE), which exploits the ratio of forward and backward transition probabilities from the pretrained diffusion model.
- Design Motivation: Standard PT requires access to the unnormalized target density, which is unavailable during inference-time control. The RNE relation \(p_{t'}(x_{t'})/p_t(x_t) = R_{t,t'}^{-1}\) circumvents the need to evaluate densities directly.
Annealing Path Design:
- Function: Defines sequences of intermediate distributions for different control tasks.
- Mechanism:
  - Tempering: \(\pi_t(x) \propto p_t^j(x)^\beta\)
  - Reward tilting: \(\pi_t(x) \propto p_t^j(x) \exp(r_t(x))\)
  - Model composition: \(\pi_t(x) \propto \prod_j p_t^j(x)\)
  - CFG debiasing: \(\pi_t(x) \propto p_t(x)^{1-w} p_t(x|c)^w\)
- Design Motivation: All target distributions can be expressed in terms of pretrained model density ratios, making the acceptance probabilities computable via the RNE.
Online Refinement:
- Function: Dynamically adds or modifies constraints while the MCMC chain is running.
- Mechanism: The MCMC chain can run indefinitely; introducing a new reward term at any point only requires modifying the annealing path, and PT naturally adapts.
- Design Motivation: SMC is a one-shot procedure and cannot be modified post hoc; CREPE, as an MCMC method, naturally supports iterative refinement.
Support for Both Continuous and Discrete Diffusion:
- Function: Derives swap rates for both Gaussian diffusion (SDE) and discrete masked diffusion (CTMC).
- Design Motivation: Provides coverage of image generation (continuous) and text/discrete data (discrete masked diffusion, e.g., MDLM).

Loss & Training¶

No training is required; CREPE operates entirely at inference time.
Requires only the forward and reverse processes of a pretrained diffusion model.
Computational cost is comparable to SMC but distributed differently — PT incurs a burn-in cost, after which the per-sample cost remains constant.

Key Experimental Results¶

Main Results¶

Molecular Temperature Annealing (Alanine Dipeptide / Tetrapeptide / Hexapeptide)

Method	Energy TVD ↓	TICA MMD ↓	Notes
FKC (SMC)	0.345	0.116	SMC baseline
CREPE (Ours)	0.224	0.096	Dipeptide
CREPE	0.122	0.035	Tetrapeptide

CFG Debiasing (ImageNet-64)

Method	#Samples	IR ↑	CLIP ↑	FID ↓
FKC (SMC)	8	-0.29	24.17	1.85
CREPE	8	-0.30	24.10	1.92
FKC	512	-0.08	24.31	1.96
CREPE	512	0.09	24.28	1.79

Key Findings¶

SMC outperforms CREPE with few samples (due to burn-in), but CREPE surpasses SMC as the sample count increases, with FID improving continuously.
CREPE's core advantage lies in diversity — SMC's resampling causes particle collapse (visually similar outputs within a batch), whereas CREPE's MCMC chain naturally explores a broader distribution.
In online refinement experiments, CREPE satisfies newly introduced constraints within only 1k iterations, demonstrating strong flexibility.
CREPE is also effective on discrete diffusion (MNIST MDLM), confirming the generality of the approach.

Highlights & Insights¶

The computational dual perspective on SMC is exceptionally elegant — flipping "parallel particles × serial timesteps" to "serial particles × parallel timesteps" captures the core contribution in a single sentence. This duality (Syed et al., 2024) reflects deep connections in sampling theory.
Online refinement is entirely beyond the reach of SMC and is highly valuable for practical applications such as interactive generation and iterative design.
The unified framework covers tempering, reward tilting, model composition, and CFG debiasing, and these objectives can be freely combined. The methodology is broadly applicable.

Limitations & Future Work¶

Sample quality during burn-in is poor; CREPE is inferior to SMC in low-sample regimes.
Each swap move requires simulating both forward and backward diffusion paths, imposing non-trivial computational overhead.
For high-resolution images (ImageNet-512), only qualitative results for reward tilting are presented; quantitative comparisons are lacking.
Acceptance rates may decrease with dimensionality, requiring finer annealing schedules.
Combinations with guidance methods (e.g., DPS, FreeDoM) remain unexplored.

vs. FKC (SMC): Computational dual relationship. SMC is superior with few samples; CREPE is superior with many. CREPE offers better diversity.
vs. Twisted SMC / DDRM: Both are debiasing methods for inference-time control, but CREPE is MCMC-based rather than importance-sampling-based.
vs. APT (Zhang et al., 2025): CREPE extends APT from the setting with known unnormalized densities to the setting where only a pretrained diffusion model is available.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First adaptation of Parallel Tempering to inference-time control of diffusion models; the SMC dual perspective is remarkably elegant.
Experimental Thoroughness: ⭐⭐⭐⭐ — Covers multiple modalities including molecular, image, trajectory, and discrete data, but quantitative evaluations on high-resolution images are limited.
Writing Quality: ⭐⭐⭐⭐ — Theoretically rigorous but notation-dense; a strong background in stochastic processes is recommended.
Value: ⭐⭐⭐⭐ — Introduces a new paradigm for inference-time control of diffusion models, with unique advantages in diversity and online refinement.