CVPR 2026 Image Generation Fractal Geometry Diffusion Models Partitioned Iterated Function Systems DDIM Noise Schedule Fractal Dimension

Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems¶

Conference: CVPR 2026 arXiv: 2603.13069 Code: To be confirmed Area: Diffusion Model Theory / Image Generation / Fractal Geometry Keywords: Fractal Geometry, Diffusion Models, Partitioned Iterated Function Systems, DDIM, Noise Schedule, Fractal Dimension

TL;DR¶

This paper proves that the DDIM deterministic reverse chain is equivalent to a Partitioned Iterated Function System (PIFS), and derives three computable quantities from fractal geometry—the contraction threshold \(L_t^*\), the diagonal expansion function \(f_t(\lambda)\), and the global expansion threshold \(\lambda^{**}\)—providing a unified theoretical explanation for four empirically motivated design choices: cosine schedule offset, resolution logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling schedule.

Background & Motivation¶

Modern score-matching diffusion models construct high-quality images through sequential denoising processes, grounded theoretically in continuous-time SDEs or probability-flow ODEs. However, this continuous perspective treats the learned score network as a black box, offering no structural explanation of how the discrete sampling chain assembles global spatial context in early steps and synthesizes local details in later steps, nor why self-attention serves as such an effective generative primitive. Existing scheduling choices (cosine offset \(s_{off}=0.008\)), Min-SNR weighting (\(\gamma=5\)), and similar decisions are based on empirical tuning and lack a unified theoretical justification.

Core Problem¶

What does the diffusion model actually do when transforming noise into images? What is the mathematical structure of the DDIM reverse chain? Can a unified design language be derived from this structure to explain and guide the selection of noise schedules, training objectives, and sampling strategies?

Method¶

Overall Architecture¶

The DDIM single-step operator \(\Phi_t(x) = \sqrt{\bar\alpha_{t-1}/\bar\alpha_t}\,x + b_t\,\hat\varepsilon_\theta(x,t)\) is treated as an affine map within a PIFS. The spectral structure of its Jacobian is analyzed to derive contraction/expansion properties, and the Moran equation is applied to compute the attractor dimension.

Key Designs¶

Contraction Structure Analysis (Section 3): Two contraction conditions are derived—Euclidean Contraction (EC), based on a closed-form contraction threshold \(L_t^* = (\sqrt{\bar\alpha_{t-1}/\bar\alpha_t}-1)/|b_t|\) (depending solely on the noise schedule, not on data or the network), and Block Max-Norm Contraction (PC), which separates the diagonal term \(\kappa_t^{diag}\) from the cross-patch coupling \(\delta_t^{cross}\). Score-matching training is shown to be a diffusion analogue of Barnsley collage minimization, establishing an \(L_2\)-\(\mathcal{W}_1\) bridge.
Two-Phase Structure (Section 4): In high-noise Regime I (global context assembly), attention is broadcast and a directional "suppression field" \(S_{k,t}\) keeps each diagonal block below the expansion threshold. In low-noise Regime II (detail synthesis), attention localizes and suppression is released patch-by-patch in ascending order of patch variance (Stratified Crossover), with the sign of the Spearman rank correlation reversing and then recovering.
Attractor Geometry (Section 5): The Kaplan-Yorke dimension is defined via the Lyapunov spectrum. The global expansion threshold \(\lambda^{**}\) is the unique solution to the discrete Moran equation \(\prod_t f_t(\lambda^{**})=1\). Directions whose patch variance exceeds \(\lambda^{**}\) contribute positive Lyapunov exponents. The suppression-field-corrected effective dimension satisfies \(d_{KY}^{eff} \leq d_{KY}\).
Three Design Criteria (Section 6): (i) Maximize the weakest-link contraction threshold \(\min_t L_t^*\); (ii) balance the per-step Lyapunov contribution (equivalent to a constant information-gain condition); (iii) concentrate sampling steps where \(L_t^*\) is smallest to equalize contraction workload.

Loss & Training¶

The standard DSM loss is shown to be equivalent to the collage error weighted by SNR across timesteps. A PIFS regularization loss is proposed: \(\mathcal{L}_{PIFS} = \mathcal{L}(\theta) + \mu_{reg}\sum_{t,k,j\neq k}\|[J_x\hat\varepsilon_\theta]_{kj}\|_F^2\), which directly penalizes cross-patch Jacobian blocks to enhance the (PC) contraction margin. Gradients are computable via \(O(M^2 n_k)\) JVP/VJP operations.

Key Experimental Results¶

Experiments are conducted on DDPM CIFAR-10 (32×32, 8×8 patches, \(M\)=16) and CelebA-HQ-256 (64×64, 16×16 patches):

Prediction	Measured Result	Note
Regime I diagonal blocks near neutral	\(\kappa_t^{diag}\approx 1.000\) (range 0.993–1.007)	Maintained by suppression field
Regime II diagonal blocks expand	\(\kappa_t^{diag}\) rises from 1.007 to 1.211	Released in low-noise regime
(PC) crossover window	\(\sim\)40 steps (\(t\)∈[160,200])	Violations drop from 87.5% to 0%
(MM) Spearman correlation	Peak \(\rho\)=0.953 (\(p\)=1.2e-8) at \(t\)=601	Suppression increases with patch variance
Score deviation \(O(\bar\alpha_t)\)	OLS slope 0.95, CI [0.88,1.02], \(R^2\)=0.994	Consistent with theoretical prediction
CelebA \(d_{KY}^{eff}=0\)	\(\Lambda_{eff,k}<0\) for all 16 patches	16/16 sign agreement

The cosine schedule information-gain coefficient of variation is 0.867 vs. 1.107 for linear; the \(L_t^*\) coefficient of variation is 0.341 for linear vs. 0.474 for cosine—indicating the two criteria cannot be simultaneously optimized by a single schedule.

Ablation Study¶

PIFS Moran threshold on the full CIFAR-10 training chain: \(\lambda^{**}\)=1.0024; all 16 patches have \(\lambda_k\in[18.7,44.4]\), far exceeding the threshold, yielding \(d_{KY}=n=3072\) (saturated).
On CelebA-HQ, the strong suppression field yields \(\lambda^{***}=500 \gg \lambda_{max}=231.7\), giving \(d_{KY}^{eff}=0\)—reflecting this model's low-diversity, high-fidelity characteristics.
The 50-step DDIM sub-sampled chain preserves the same two-phase structure and \(L_t^*\) distribution as the 1000-step chain (step-count invariance).
The cosine offset \(s_{off}=0.008\) raises \(L_1^*\) from 7.9e-4 to 3.2e-3 (a 4× improvement), precisely corresponding to Criterion 1.

Highlights & Insights¶

The paper offers a novel and highly original theoretical perspective on diffusion models through the lens of fractal geometry.
The three computable quantities \(L_t^*\), \(f_t(\lambda)\), and \(\lambda^{**}\) fully characterize the denoising dynamics without requiring model evaluation.
Four independently discovered empirical design choices are unified within a single mathematical framework, yielding strong explanatory power.
The paper presents 27 theorems/corollaries forming a complete theoretical system, supported by thorough experimental validation.

Limitations & Future Work¶

The theoretical analysis is restricted to the deterministic DDIM case; stochastic DDPM samplers and Flow Matching are only briefly extended in the appendix.
The block-diagonal covariance assumption under Gaussian data holds only approximately for real data.
Experimental validation uses relatively simple DDPM models (CIFAR-10/CelebA) and has not been verified on modern architectures such as DiT.
Quantitative improvements from the PIFS regularizer \(\mathcal{L}_{PIFS}\) are not reported with comparative numbers.

Raya & Ambrogioni (NeurIPS 2023): Reports spontaneous symmetry breaking in diffusion models; this paper provides a structural proof of the two-phase structure from a PIFS perspective.
Nichol & Dhariwal (ICML 2021): Empirically proposes the cosine schedule offset \(s_{off}=0.008\); this paper proves it is an approximate solution to maximizing \(\min_t L_t^*\).
Min-SNR (ICCV 2023): Proposes loss weighting from a multi-task learning perspective; this paper proves it is equivalent to balancing the Kaplan-Yorke dimension growth rate.
Align Your Steps (ICML 2024): Optimizes sampling step allocation from a KL divergence perspective; this paper independently derives the same allocation density from a contraction workload perspective.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ (PIFS–diffusion model connection, three computable quantities, unification of four empirical design choices)
Experimental Thoroughness: ⭐⭐⭐⭐ (systematic experimental validation of theoretical predictions across 7 subsections)
Writing Quality: ⭐⭐⭐⭐ (rigorous mathematical derivations paired with clear intuitive explanations)
Value: ⭐⭐⭐⭐ (provides a first-principles theoretical framework for diffusion model design)