Skip to content

LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow Matching

Conference: NeurIPS 2025 arXiv: 2510.14623 Code: GitHub Area: Explainable AI / Counterfactual Explanation Keywords: Counterfactual Explanation, Conditional Flow Matching, Reliability, Model-Agnostic, Information Blending

TL;DR

This paper proposes LeapFactual, a counterfactual explanation algorithm based on Conditional Flow Matching (CFM), which bridges flattened and structured latent spaces via a "Lift-Land" (Leap) mechanism to generate reliable, in-distribution counterfactual samples that remain effective even when the learned decision boundary deviates from the true boundary.

Background & Motivation

Counterfactual explanation (CE) provides model interpretability by answering: "What changes to the input would alter the model's prediction?" Existing methods suffer from three core issues:

Optimization-based (Opt) methods: These methods optimize latent vectors in a generative model's latent space to cross the decision boundary. However, when the source and target classes are far apart (especially in multi-class settings), gradients must traverse multiple class regions, causing vanishing gradients. As a result, generated counterfactuals tend to cluster near the decision boundary rather than truly reflecting target-class characteristics. Additionally, these methods require a differentiable classifier.

Conditional Generative Model (CGM)-based methods: These methods use classifier outputs as conditions for a generative model, generating counterfactuals by substituting the condition. However, the latent space is discontinuous — different class latent spaces are disjoint, making meaningful interpolation or back-tracing to the decision boundary infeasible.

Reliability problem: Counterfactuals generated by existing methods typically lie near the learned decision boundary rather than the true decision boundary. When the learned boundary deviates from the true one, generated counterfactuals may be neither in-distribution nor representative of the target class.

The authors' core insight is that flow matching can establish a continuous and invertible mapping between flattened latent representations (where class and residual information are entangled) and structured latent representations (where class information is an external condition), thereby enjoying the advantages of both paradigms simultaneously.

Method

Overall Architecture

LeapFactual introduces a new dimension in latent space: in the forward direction, class information is stripped away ("Lift"); in the reverse direction, class information is injected ("Land"). Specifically, Conditional Flow Matching is used to learn a continuous mapping from a structured representation \(Z_0\) (without class information) to a flattened representation \(Z_1\) (with class information), as well as its inverse.

Key Designs

  1. CE-CFM Training Objective: Standard I-CFM assumes independent coupling between source and target distributions, but in the counterfactual setting \(Z_0\) and \(Z_1\) are correlated through a shared parent — the residual information \(R\). The authors redefine the conditioning term as \(h := (z_0, z_{1,c})\), where \(z_{1,c} \sim q(z_1|c)\) is a latent vector from class \(c\). The training objective is:
\[\mathcal{L}_{\text{CE-CFM}}(\psi) := \mathbb{E}_{t, q(h), p_t(z|h)} \| v_\psi(t, z, c) - u_t(z|h) \|^2\]

By explicitly conditioning the network on class information \(c\), the information bottleneck effect of Gaussian \(Z_0\) compresses the class information in \(Z_1\). Theorem 1 formally proves that \(Z_0\) is a compressed representation of \(Z_1\), where the information lost in compression is exactly the class information \(C\) provided as the condition.

  1. Leap Mechanism (Lift-Land Transport): A single Leap consists of two steps: (a) Lift: integrate backward \(\int_1^t \gamma_{\text{lift}} v_\psi(\tau, z^{y_c}(\tau), y_c) d\tau\) from \(Z_1\) to \(Z_0\), removing current class information; (b) Land: integrate forward \(\int_0^t \gamma_{\text{land}} v_\psi(\tau, z^{\hat{y}_c}(\tau), \hat{y}_c) d\tau\) from \(Z_0\) to \(Z_1\), injecting target class information. Three operational modes are achieved by adjusting the step size \(\gamma\).

  2. Information Blending and Injection:

    • Blending: Setting \(\gamma_b = \gamma_{b,\text{lift}} = \gamma_{b,\text{land}} < 1\) produces counterfactuals that blend source and target class features, yielding local counterfactuals. Blending automatically stops once the target class is reached.
    • Injection: Setting \(\gamma_{i,\text{lift}} < \gamma_{i,\text{land}}\) continues injecting target class information after the target class has been reached, pushing counterfactuals deeper into the target class data distribution. This is the key to generating "reliable" counterfactuals — ensuring samples not only cross the learned boundary but also approach the true decision boundary.

Loss & Training

Only the CE-CFM objective is optimized during training. The flow matching model can be lightweight (e.g., a 4-layer MLP). During inference, counterfactuals are generated by combining \(N_b\) blending Leaps and \(N_i\) injection Leaps. Starting with small step sizes and more Leaps is recommended.

Key Experimental Results

Main Results

Morpho-MNIST Counterfactual Generation Quality

Method ACC↑ AUC↑ D(Area)↓ D(Thickness)↓ D(Height)↓
Opt-based 0.828 0.881 0.248 0.172 0.062
CGM-based 0.942 0.998 0.256 0.086 0.029
LeapFactual 0.987 0.999 0.167 0.081 0.027
LeapFactual_R 0.991 1.000 0.230 0.090 0.030

LeapFactual leads on both correctness and similarity metrics; LeapFactual_R (with information injection) further improves correctness.

Ablation Study

Galaxy10 Dataset — Reliable Counterfactuals for Model Improvement

Training Configuration CE Ratio ACC↑ AUC↑ Note
Baseline (20% data) - 0.811 0.977 -
Baseline (100% data) - 0.853 0.981 -
+ Standard CE 100% 0.797 0.974 Performance drops! CEs near learned boundary
+ Reliable CE 10% 0.816 0.978 Performance improves
+ Reliable CE 100% 0.824 0.979 Approaches 100% data baseline

Using standard counterfactuals as training data degrades performance, whereas reliable counterfactuals consistently improve the model — validating the importance of reliability.

FFHQ High-Resolution Experiment (1024×1024, Non-differentiable Classifier CLIP)

Nb ACC↑ SSIM↑ LPIPS↓
5 0.706 0.564 0.149
10 0.957 0.538 0.171
20 0.993 0.525 0.180
Random Pairing - 0.070 0.555

Key Findings

  • LeapFactual is model-agnostic: it does not require a differentiable classifier and can use CLIP as a proxy for human annotations, extending CE to citizen science domains that rely on manual labeling.
  • Reliable counterfactuals are not only more interpretable, but can also serve as data augmentation to improve model performance — something standard counterfactuals cannot achieve.
  • The classification prediction trajectory can be tracked during information blending, revealing the class transition process (e.g., transitioning from blue through yellow to red).

Highlights & Insights

  • Strong theoretical grounding: The validity of CE-CFM is established through d-separation arguments and information-theoretic theorems, rather than purely empirical justification.
  • Unified framework: The method simultaneously possesses the continuous latent space of Opt-based methods and the structured representation of CGM-based methods, addressing the limitations of both paradigms.
  • Novel reliability concept: The distinction between "crossing the learned boundary" and "approaching the true boundary" is clearly articulated, with information injection achieving the latter.
  • The toy experiment visualizations are highly intuitive, clearly illustrating the differences among information replacement, blending, and injection modes.

Limitations & Future Work

  • Training costs for flow matching models increase substantially in high-dimensional latent spaces (e.g., diffusion models, normalizing flows).
  • The method focuses solely on visual data; although theoretically generalizable to other modalities, this has not been empirically validated.
  • More efficient flow matching variants such as OT-CFM are not explored, as optimal transport requires modifying the transport map over classifier predictions.
  • Scenarios involving entangled latent spaces and imbalanced datasets remain to be investigated.
  • The finding that reliable counterfactuals serve as effective data augmentation warrants further investigation, potentially in combination with active learning to identify regions of highest model uncertainty.
  • Flow matching may have analogous applications in other interpretability tasks, such as concept editing and attribute manipulation.
  • The model-agnostic property makes LeapFactual applicable to explaining black-box API services.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ First to introduce conditional flow matching into counterfactual explanation; the reliability concept and Leap mechanism are elegantly designed.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Covers benchmark datasets, real astronomical data, and high-resolution face images, though the number of baseline comparisons is somewhat limited.
  • Writing Quality: ⭐⭐⭐⭐⭐ Problem motivation is clearly articulated, theoretical derivations are complete, and visualizations are excellent.
  • Value: ⭐⭐⭐⭐ Addresses the core reliability problem in counterfactual explanation; the use of reliable CEs for data augmentation has practical significance.