Skip to content

ContinualFlow: Learning and Unlearning with Neural Flow Matching

Conference: ICML 2025
arXiv: 2506.18747
Code: None
Area: Image Generation
Keywords: Flow Matching, Machine Unlearning, Energy Function, Optimal Transport, Generative Models

TL;DR

Proposes ContinualFlow, a targeted unlearning framework for generative models based on Flow Matching. By reweighting via an energy function to softly subtract unwanted regions of the data distribution, it achieves efficient unlearning without requiring retraining or direct access to the samples to be forgotten.

Background & Motivation

Machine unlearning has emerged as a core problem in generative modeling, touching upon legal compliance (e.g., GDPR), ethics, and model deployment. Unlike discriminative models, generative models learn a complex mapping from a latent distribution to a data distribution, with highly entangled internal representations, making it difficult to precisely isolate the influence of specific input data.

Existing methods are primarily divided into two categories:

Output Suppression: Prevents the generation of sensitive content at inference time through decoding guidance or filters, but does not modify the model's internal knowledge, making it vulnerable to adversarial prompt bypasses.

Model Patching: Modifies model parameters via fine-tuning or targeted updates. Although more permanent, it can damage unrelated capabilities, presenting a trade-off between unlearning efficacy and generalization retention.

Both categories exhibit distinct limitations: output suppression does not touch the underlying knowledge, while model patching lacks theoretical guarantees and can cause "collateral damage." This work takes a different approach by leveraging the geometric (trajectory) perspective of Flow Matching, redefining the unlearning problem as a distribution transport problem and using an energy function to guide generative trajectories away from unwanted regions.

Method

Overall Architecture

The core idea of ContinualFlow is to view the unlearning of generative models as a "soft mass subtraction" from the data distribution. The framework covers two scenarios:

Scenario 1: Direct Access to the Forget Set
When the forget set \(\mathcal{D}_{\text{forget}}\) is directly accessible, Optimal Transport Flow Matching (OT-FM) is used to map trajectories directly from samples generated by the original model \(G_\theta\) to the retain set \(\mathcal{D}_{\text{retain}}\). This eliminates the reliance on predefined priors and simplifies training.

Scenario 2: No Access to the Forget Set (Main Contribution)
In more practical settings where \(\mathcal{D}_{\text{forget}}\) is not directly available, a proxy function (such as a classifier or scoring model) is leveraged to detect unwanted content. The classifier output is treated as an unnormalized energy function \(F(x) \propto -\log q_f(x)\), enabling generative trajectory updates without requiring explicit samples.

Key Designs

1. Energy-Reweighted Soft Mass Subtraction

Let \(q_0(x)\) be the known samplable source distribution, and \(q_f(x)\) be the unknown forget distribution. The density of \(q_0(x)\) is modulated via an energy function \(F(x)\) as follows:

\[\tilde{R}(x) \propto q_0(x) \cdot \sigma(-\lambda F(x))\]

where \(\sigma(z) = \frac{1}{1+e^{-z}}\) is the sigmoid function, and \(\lambda > 0\) controls the suppression sensitivity. The normalized target distribution is given by:

\[\tilde{q}_1(x) = \frac{1}{Z} \tilde{R}(x), \quad Z = \int q_0(x) \sigma(-\lambda F(x)) dx\]

Design Motivation: The sigmoid function provides a smooth and differentiable density reweighting, avoiding discontinuities associated with hard thresholding or sample exclusion. \(\lambda\) controls the suppression intensity—a larger \(\lambda\) assigns lower weights to high-energy regions (associated with the forget distribution), leading to more thorough unlearning.

2. Energy-Reweighted Flow Matching Objective (ERFM)

Introducing the energy weight into the standard CFM loss defines the ERFM loss:

\[\mathcal{L}_{\text{ERFM}}(\theta) = \mathbb{E}_{x_0, x_1 \sim q_0, t \sim \mathcal{U}[0,1], x \sim p_t(x|x_0,x_1)} \left[ \sigma(-\lambda F(x_1)) \cdot \|v_\theta(t,x) - u_t(x|x_0,x_1)\|^2 \right]\]

Core Theorem (Theorem 4.1): The gradient of the ERFM loss is equivalent to the gradient of the standard CFM trained towards the soft mass subtraction target \(\tilde{q}_1\) (up to a positive constant):

\[\nabla_\theta \mathcal{L}_{\text{ERFM}}(\theta) = C \cdot \nabla_\theta \mathcal{L}_{\text{CFM}}^{q_0 \to \tilde{q}_1}(\theta), \quad C > 0\]

The key to the proof is that \(\frac{\tilde{q}_1(x_1)}{q_0(x_1)} \propto \sigma(-\lambda F(x_1))\), meaning the energy weight is exactly equivalent to the importance sampling weight from \(q_0\) to \(\tilde{q}_1\).

3. Classifier as an Energy Function Proxy

The paper further proves (Proposition B.1) that a Bayes-optimal binary classifier \(C(x)\) can be naturally converted into an energy function:

\[F(x) = -\log\left(\frac{C(x)}{1-C(x)}\right)\]

In this case, \(\sigma(-\lambda F(x)) = \frac{(1-C(x))^\lambda}{(1-C(x))^\lambda + C(x)^\lambda}\). The more confident the classifier is that a sample belongs to the forget class, the lower this weight becomes.

4. Invertibility of the Energy Function

A unique property of ContinualFlow is the composable and invertible nature of the energy function: by inverting the sign of the energy function, forgotten content can be recovered without requiring direct access to its samples. For example, in MNIST experiments, using \(F(x)\) to suppress odd digits and then using \(-F(x)\) can restore the generation of odd digits.

Loss & Training

Training Algorithm (Algorithm 1): 1. Sample \(\{x_0^{(j)}\}\) and \(\{x_1^{(j)}\}\) from \(q_0\). 2. Uniformly sample time \(t^{(j)} \sim \mathcal{U}(0,1)\). 3. Compute the interpolation \(x_t^{(j)} = (1-t^{(j)})x_0^{(j)} + t^{(j)}x_1^{(j)}\). 4. Compute the energy weight \(w^{(j)} = \sigma(-\lambda F(x_1^{(j)}))\). 5. Compute the normalized weighted loss: \(\mathcal{L} = \frac{\sum_j w^{(j)} \|v_\theta(x_t^{(j)}, t^{(j)}) - (x_1^{(j)} - x_0^{(j)})\|^2}{\sum_j w^{(j)}}\).

Regarding the training strategy, a mini-batch OT is employed to approximate the optimal transport plan, achieving efficient training. During inference, samples are generated using a 10-step flow integration.

Key Experimental Results

Main Results

The paper evaluates the method on 2D synthetic data (Circles, Moons, Gaussians, Checkerboard), MNIST, and CIFAR-10.

Dataset Method MMD ↓ Accuracy ↑ Forget Rate ↓ Leakage ↓ Training Time (s)
MNIST Retrain (GT) 0.0004 0.9861 0.0050 0.0108 300.00
MNIST Fine-tuning 0.0039 0.9551 0.0143 0.0214 92.86
MNIST CFlow (Ours) 0.0020 0.9673 0.0005 0.0015 158.74
CIFAR-10 Retrain (GT) 0.0056 0.8920 0.1127 0.1546 802.37
CIFAR-10 Fine-tuning 0.0077 0.9005 0.2157 0.2401 252.89
CIFAR-10 CFlow (Ours) 0.0064 0.8847 0.1704 0.1748 427.15

On MNIST, ContinualFlow's Forget Rate (0.0005) and Leakage (0.0015) are far superior to Fine-tuning (0.0143/0.0214), and even outperform Retrain (0.0050/0.0108).

Ablation Study

Configuration (\(\lambda\) value) Effect Description
\(\lambda = 0.5\) Slight suppression Significant residual generation still exists in the forget set
\(\lambda = 2\) Moderate suppression The forget class is noticeably reduced but not completely eliminated
\(\lambda = 5\) Strong suppression The forget class almost completely disappears
\(\lambda = 1000\) Near-total suppression Almost exclusively generates content from the retain set
Inverse Energy (\(-F\)) Restore forgotten content Validates the invertibility of the energy function

Key Findings

  1. Unlearning performance outperforms retraining: On MNIST, CFlow's Forget Rate/Leakage is even lower than the baseline of retraining from scratch, indicating that energy guidance enables more precise targeted suppression.
  2. Intuitive validation in 2D experiments: On the Checkerboard dataset, CFlow's MMD (0.0063) is actually better than Retrain (0.0136), suggesting that OT-FM might learn a better approximation of the retained distribution.
  3. Energy invertibility: Reversing the energy function can restore the forgotten classes, which holds significant importance for privacy-controllable generative modeling.
  4. Classifiers as energy proxies: Theoretically proving that the logit output of a Bayes-optimal classifier can serve directly as an energy function makes the method easy to deploy in practice.

Highlights & Insights

  1. Solid Theoretical Foundation: The ERFM loss equivalence theorem (Theorem 4.1) provides a rigorous theoretical foundation that unifies energy reweighting with Flow Matching, rather than relying on heuristic designs.
  2. No Need for Forget Samples: The most prominent highlight is that it does not require access to the data to be forgotten itself. Only an energy proxy (e.g., a classifier) is needed, greatly enhancing practicality.
  3. Composable and Invertible: The modular design of the energy function supports composition (superimposing multiple forget targets) and invertibility (recovering forgotten content), offering a flexible mechanism for continual unlearning.
  4. Trajectory-Level Control: Unlike output suppression or model editing, this method directly modulates generative trajectories during the training phase, providing a more fundamental distribution-level unlearning.

Limitations & Future Work

  1. Dependency on Energy Function Quality: The unlearning efficacy strongly depends on the alignment between the energy function and the true forgotten distribution. If the classifier/scorer is inaccurate, unlearning may be incomplete or excessive.
  2. Validation Limited to Simple Scenarios: Experiments are limited to 2D synthetic data, MNIST, and the latent space of CIFAR-10, without validation on large-scale text-to-image models (e.g., Stable Diffusion).
  3. Semantic-Level Unlearning to Be Explored: Current energy functions are primarily based on class-level or binary proxies. Extending this to semantic-level or geometric-level formulations is a direction for future work.
  4. Limited Efficacy on CIFAR-10: The Forget Rate on CIFAR-10 (0.1704) is significantly higher than on MNIST (0.0005), suggesting that unlearning in high-dimensional, complex data remains challenging.
  5. Continual Multi-Round Unlearning: Although named ContinualFlow, the paper does not thoroughly evaluate performance degradation under multi-round iterative unlearning scenarios.
  • Flow Matching (Lipman et al., 2023; Tong et al., 2023): The technical foundation of this paper, utilizing CFM and OT-CFM frameworks.
  • Selective Amnesia (Heng & Soh, 2023): Uses continual learning tools (e.g., EWC) for unlearning, but requires access to the forgotten data.
  • Erasing Concepts (Gandikota et al., 2023): A representative work on concept erasing in diffusion models, belonging to output suppression/model editing categories.
  • Insights: The approach of formalizing unlearning as a distribution transport problem is highly elegant and could inspire the application of similar frameworks to other generative paradigms (e.g., diffusion, VAEs).

Rating

  • Novelty: ⭐⭐⭐⭐ - Combining Flow Matching with energy functions for unlearning presents a fresh perspective, though its core relies on the clever application of importance weighting.
  • Experimental Thoroughness: ⭐⭐⭐ - Validation on 2D, MNIST, and CIFAR-10 demonstrates feasibility, but lacks large-scale experiments and broader baseline comparisons.
  • Writing Quality: ⭐⭐⭐⭐ - Clear theoretical derivations, intuitive visualizations, and well-structured overall presentation.
  • Value: ⭐⭐⭐⭐ - Establishes a theoretical foundation for unlearning in Flow Matching, providing valuable inspiration for future research.

Rating

  • Novelty: TBD
  • Experimental Thoroughness: TBD
  • Writing Quality: TBD
  • Value: TBD