2ndMatch: Finetuning Pruned Diffusion Models via Second-Order Jacobian Matching¶

Conference: CVPR 2026
arXiv: 2506.05398
Code: No
Area: Diffusion Model / Model Compression
Keywords: Diffusion Model, Pruning, Jacobian Matching, Finite-Time Lyapunov Exponent, Knowledge Distillation

TL;DR¶

This paper proposes 2ndMatch, a fine-tuning framework for pruned diffusion models that aligns the second-order Jacobian matrix \(J^\top J\) between the pruned and original models—inspired by finite-time Lyapunov exponents (FTLE)—to match their sensitivity to input perturbations over time, thereby significantly closing the generation quality gap.

Background & Motivation¶

Background: Diffusion models achieve excellent image generation quality but require hundreds of denoising steps at inference, incurring substantial computational costs. Model pruning is an effective strategy for reducing per-step computation.

Limitations of Prior Work: Post-pruning fine-tuning typically reuses the original denoising score matching (DSM) objective, which is insufficient for capacity-reduced pruned models. Existing knowledge distillation approaches align outputs or intermediate features but overlook the model's sensitivity—i.e., how the score function responds to input perturbations. First-order Jacobian matching is essentially equivalent to KD for diffusion models (since inputs already contain noise perturbations) and cannot capture perturbation propagation across time steps.

Key Challenge: Pruning reduces model capacity, causing the pruned model's sensitivity to perturbations to diverge from the original, leading to denoising trajectory drift and degraded generation quality. A method is needed to constrain the pruned model to maintain the same temporal dynamics as the original.

Key Insight: The paper views diffusion models as discrete-time dynamical systems and draws on FTLE theory, which quantifies the amplification/contraction rate of small perturbations over finite time horizons.

Core Idea: Align the \(J^\top J\) (second-order Jacobian metric) between pruned and original models, using random projections \(v^\top J^\top J v\) to efficiently estimate directional expansion rates, enabling scalable second-order Jacobian matching.

Method¶

Overall Architecture¶

The hybrid fine-tuning objective is: \(\mathcal{L}_{total} = \lambda_{NP}\mathcal{L}_{NP} + \lambda_{KD}\mathcal{L}_{KD} + \lambda_{Jac}\mathcal{L}_{2nd\text{-}Jac}\), where the three complementary components handle noise prediction, output alignment, and temporal sensitivity matching respectively.

Key Designs¶

Noise Prediction:
- Function: Standard DDPM objective that predicts the noise added during the forward process
- Mechanism: \(\mathcal{L}_{NP} = \mathbb{E}_{\tilde{x},t,\epsilon}[\|s(\tilde{x},t;\theta) - \epsilon\|_2^2]\)
- Design Motivation: Serves as the basic supervisory signal, but is insufficient alone for capacity-reduced pruned models
Knowledge Distillation:
- Function: Aligns the outputs of the pruned and original models
- Mechanism: \(\mathcal{L}_{KD} = \mathbb{E}_{\tilde{x},t}[\|s(\tilde{x},t;\theta) - s_\mathcal{D}(\tilde{x},t;\theta_\mathcal{D})\|_2^2]\)
- Design Motivation: Provides smoother supervision targets than raw noise, accelerating convergence
Second-Order Jacobian Matching (Core Innovation):
- Function: Aligns the local sensitivity of pruned and original models
- Mechanism: FTLE theory indicates that perturbation amplification is governed by \(\|v_1\| \approx \sqrt{v_0^\top J^\top J v_0}\). Since computing the full Jacobian is intractable, random projections \(v \sim \mathcal{N}(0,I)\) are used to estimate directional expansion rates: \(\mathcal{L}_{2nd\text{-}Jac} = \mathbb{E}_{\tilde{x},t,v}\left[(\|J\hat{v}\|_2^2 - \|J_\mathcal{D}\hat{v}\|_2^2)^2\right]\) where \(\hat{v} = v/\|v\|\), and \(J\hat{v}\) is efficiently computed via Jacobian-vector products (JVP) without forming the full Jacobian matrix
- Design Motivation: A Taylor expansion proof shows that first-order Jacobian matching under noisy inputs is equivalent to KD, yielding no additional benefit. Second-order matching captures perturbation propagation across time steps, better aligning dynamical system stability

Why Does First-Order Jacobian Matching Fail?¶

The paper provides a Taylor expansion proof: \(\|s(x') - s_\mathcal{D}(x')\|_2^2 = \|s(x) - s_\mathcal{D}(x)\|_2^2 + \sigma^2\|J - J_\mathcal{D}\|_F^2 + \mathcal{O}(\sigma^4)\). Under noisy inputs, output alignment implicitly subsumes first-order Jacobian matching, so explicitly adding it only increases computational overhead.

Loss & Training¶

Architecture-agnostic: applicable to both U-Net and Transformer-based diffusion architectures
Pruning-method-agnostic: compatible with Diff-Pruning, BK-SDM, and other pruning methods
Uses PyTorch's JVP functionality to efficiently compute \(J\hat{v}\)

Key Experimental Results¶

Main Results (LSUN + ImageNet 256×256, U-Net models)¶

Dataset	Method	Params	MACs	FID↓	rFID↓
LSUN-Church	DDPM (Original)	113.7M	248.7G	10.58	-
	Diff-Pruning	63.2M	138.8G	13.90	4.09
	2ndM (Ours)	63.2M	138.8G	11.25	2.08
LSUN-Bedroom	DDPM (Original)	113.7M	248.7G	6.62	-
	Diff-Pruning	63.2M	138.8G	17.90	7.62
	2ndM (Ours)	63.2M	138.8G	9.68	2.16
ImageNet	LDM-4 (Original)	400.9M	99.8G	3.60	-
	Diff-Pruning	175.8M	43.2G	10.23	9.28
	2ndM (Ours)	175.8M	43.2G	5.68	4.11

Stable Diffusion (COCO 512×512): Base+2ndM reduces FID from 15.76 to 13.84; Small+2ndM from 16.98 to 16.17.

Ablation Study (CIFAR-10)¶

Config	FID↓	FTLE
NP only	5.29	0.413
NP + KD	5.05	0.418
NP + KD + 1st JM	5.14	-
NP + KD + 2ndM (Ours)	4.58	-
Dense (Original)	4.19	-

Key Findings¶

First-order Jacobian matching is ineffective: Adding first-order JM actually worsens FID from 5.05 to 5.14, validating the theoretical analysis
Second-order matching is critical: 2ndM reduces FID from 5.05 to 4.58, with FTLE values closer to the original model, confirming the effectiveness of temporal sensitivity alignment
46% FID improvement on LSUN-Bedroom (17.90→9.68) and 55% rFID improvement on ImageNet
Also effective on Transformer architectures: U-ViT on CIFAR-10 FID drops from 4.63 to 4.05

Highlights & Insights¶

Dynamical systems perspective: Reformulating diffusion model fine-tuning as a dynamical system stability problem and using FTLE theory to guide loss design provides deep insights into diffusion model training and generation
Elegant Taylor expansion proof: Rigorously demonstrates the redundancy of first-order Jacobian matching in diffusion models, offering theoretical guidance for loss design in model compression
Practical random projections: Estimating \(v^\top J^\top J v\) via random directions circumvents the high-dimensional Jacobian computation bottleneck, scaling the method to large models (Stable Diffusion with 1.04B parameters)

Limitations & Future Work¶

Currently uses step-wise matching to approximate multi-step Jacobian propagation, limiting the ability to capture long-range temporal dependencies
The trade-off between random projection efficiency and estimation accuracy is not thoroughly explored
Only validated on image generation; applications to video/3D and other complex diffusion model tasks remain unexplored
The FTLE concept could be extended to distillation (non-pruning) settings or used to guide sampler schedule design

vs Diff-Pruning: Diff-Pruning only uses DSM for post-pruning fine-tuning; 2ndM adds sensitivity alignment on top, achieving significantly better FID at the same parameter count
vs DeepCache: DeepCache accelerates inference by caching intermediate features without reducing parameters; it is complementary to pruning approaches
vs BK-SDM: BK-SDM is a pruning method designed for Stable Diffusion; 2ndM can be directly stacked on top to further improve quality

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Introducing FTLE theory into model compression; the second-order Jacobian matching formulation is elegant and theoretically grounded
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers U-Net and Transformer architectures, 5 datasets, multiple pruning methods, and thorough ablations
Writing Quality: ⭐⭐⭐⭐⭐ Rigorous theoretical derivations, clear motivation, and systematic experimental design
Value: ⭐⭐⭐⭐ A general fine-tuning framework, though limited to the model pruning scenario