Composition and Alignment of Diffusion Models using Constrained Learning¶
Conference: NeurIPS 2025 arXiv: 2508.19104 Code: GitHub Area: Image Generation Keywords: Diffusion Models, Constrained Optimization, Model Alignment, Model Composition, Lagrangian Duality
TL;DR¶
This paper proposes a unified constrained optimization framework that formalizes reward alignment and multi-model composition of diffusion models as constrained optimization problems. By applying Lagrangian duality, the framework automatically determines optimal weights, eliminating the need for manual hyperparameter search.
Background & Motivation¶
-
Background: Alignment (adapting to rewards/preferences) and composition (combining multiple pretrained models) are two widely used adaptation strategies for diffusion models, yet both face trade-offs arising from multi-objective conflicts.
-
Limitations of Prior Work: Alignment via weighted combinations of KL divergence and rewards requires manual weight tuning; improper weights lead to overfitting a single reward or excessive deviation from the pretrained model. Equal-weight composition may favor certain similar models while neglecting others.
-
Key Challenge: The search space of weighting methods is unintuitive—weights themselves carry no semantic meaning, whereas constraint-based specifications (e.g., "reward must reach at least \(b\)") are more natural and interpretable.
-
Goal: Provide a unified framework that automatically balances multiple objectives, eliminating the need for manual weight tuning.
-
Key Insight: Replace weighted optimization with constrained optimization: the alignment problem is reformulated as "minimize KL divergence from the pretrained model subject to reward constraints"; the composition problem is reformulated as "minimize the maximum KL divergence to all constituent models."
-
Core Idea: A constrained optimization framework unifies alignment and composition. Lagrangian duality automatically learns optimal weights, making the constraint threshold the only intuitive hyperparameter to specify.
Method¶
Overall Architecture¶
Two core problems: (1) Constrained Alignment (UR-A): \(\min_p D_{KL}(p\|q)\) s.t. \(\mathbb{E}_{x\sim p}[r_i(x)] \geq b_i\); (2) Constrained Composition (UR-C): \(\min_{p,u} u\) s.t. \(D_{KL}(p\|q^i) \leq u\). Both are converted into trainable primal-dual algorithms via Lagrangian duality.
Key Designs¶
1. Closed-Form Solution for Constrained Alignment (Theorem 1)
- Function: Provides the closed-form optimal distribution for the constrained alignment problem.
- Mechanism: The optimal solution is a reward-tilted distribution \(q_{rw}^{(\lambda^*)}(\cdot) = \frac{1}{Z}q(\cdot)e^{\lambda^{*\top}r(\cdot)}\), where the optimal dual variable \(\lambda^*\) is determined automatically via dual ascent.
- Design Motivation: The theorem proves that constrained optimization is equivalent to exponential reward tilting of the pretrained distribution, with optimal weights determined automatically by the constraints.
2. Tilted Product Distribution for Constrained Composition (Theorem 3)
- Function: Automatically determines optimal weights for multi-model composition.
- Mechanism: The optimal solution is a tilted product distribution \(q_{AND}^{(\lambda)}(\cdot) \propto \prod_{i=1}^m (q^i(\cdot))^{\lambda_i/\mathbf{1}^\top\lambda}\). KL constraints ensure the composed distribution deviates equidistantly from each pretrained model.
- Design Motivation: Equal-weight composition favors the most similar models (e.g., two similar Gaussians dominate a third); the constrained approach automatically balances across all models.
3. Distinguishing Path-wise and Point-wise KL Divergence
- Function: Selects the appropriate KL measure for each task.
- Mechanism: Path-wise KL measures discrepancy over entire diffusion trajectories and is suited for alignment; point-wise KL measures discrepancy in the final distribution and is more appropriate for composition. Lemma 2 introduces a novel method to compute point-wise KL divergence.
- Design Motivation: The two KL measures have distinct properties; choosing correctly affects both theoretical guarantees and practical performance.
Loss & Training¶
Primal-dual alternating optimization: the primal step minimizes the Lagrangian (SGD), while the dual step updates multipliers via subgradient ascent proportional to constraint violation. LoRA is used to fine-tune Stable Diffusion v1.5.
Key Experimental Results¶
Main Results¶
Multi-reward constrained alignment (Stable Diffusion + MPS / Saturation / Local Contrast):
| Method | MPS Reward | Saturation Constraint | Contrast Constraint | KL from Pretrained |
|---|---|---|---|---|
| Equal-weight | Decreased | Partially satisfied | Partially satisfied | Larger |
| Constrained Alignment | +50% improvement | Satisfied | Satisfied | Smaller |
Multi-model composition (each model fine-tuned with a different reward):
| Method | Reward Retention Rate | Worst Reward |
|---|---|---|
| Equal-weight Composition | Some rewards drop by 30% | Notable degradation |
| Constrained Composition | All maintained or improved | No notable degradation |
Ablation Study¶
| Configuration | Key Metric | Remarks |
|---|---|---|
| Constrained Alignment vs. Weighted Alignment | Constrained achieves smaller KL and satisfies all rewards | Weighted method tends to overfit easily optimized rewards |
| Constrained Composition vs. Equal-weight Composition | Constrained achieves higher minimum CLIP/BLIP scores | Equal-weight method favors similar models |
Key Findings¶
- Weighted methods in multi-reward alignment tend to overfit easily optimizable rewards (e.g., saturation) while neglecting harder ones (e.g., HPS).
- The constrained method improves all rewards simultaneously while maintaining a smaller KL divergence from the pretrained model.
- In concept composition, the constrained method achieves higher minimum CLIP and BLIP scores compared to equal-weight composition.
Highlights & Insights¶
- Theoretical Rigor: A complete theoretical chain from the distribution space to the diffusion model space, establishing strong duality.
- Practicality: Constraint thresholds are more intuitive than weights—"reward must improve by at least 50%" is easier to reason about than "set weight to 0.3."
- Unification: A single framework addresses alignment and composition, two seemingly distinct problems.
Limitations & Future Work¶
- MCMC sampling is costly in high-dimensional settings such as image generation.
- Point-wise KL divergence computation depends on the quality of the score function.
- Validation is currently limited to Stable Diffusion v1.5; scaling to larger models warrants further investigation.
- The framework is extensible to mixture composition (OR mode) and conditional constraints.
Related Work & Insights¶
- The AlignProp framework is extended to a multi-reward constrained variant.
- Compared to alignment methods such as DPO, the constrained approach does not require manual tuning of regularization weights.
- Insight: Constrained optimization offers a more interpretable and easier-to-tune alternative to weighted optimization.
Rating¶
- Novelty: ⭐⭐⭐⭐ The constrained perspective unifying alignment and composition is original.
- Experimental Thoroughness: ⭐⭐⭐ Moderate scale, but core claims are validated.
- Writing Quality: ⭐⭐⭐⭐ Theoretical derivations are clear and well-presented.
- Value: ⭐⭐⭐⭐ Provides a principled framework for adapting diffusion models.