No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings¶

Conference: ICLR 2026 arXiv: 2602.22689 Code: GitHub Area: AI Security / Privacy Attacks Keywords: Membership Inference Attack, Diffusion Models, Caption-Free Setting, Model-Fitted Embeddings, Privacy Auditing

TL;DR¶

This paper proposes MoFit, the first membership inference attack (MIA) framework for diffusion models under a caption-free setting. By constructing surrogate images and conditional embeddings that overfit to the target model, MoFit exploits the asymmetric sensitivity of member samples to conditioning mismatch to enable effective inference.

Background & Motivation¶

The memorization tendency of diffusion models in high-fidelity generation raises concerns about privacy and intellectual property.
Membership inference attacks (MIA) are the standard method for auditing memorization.
Critical assumption flaw in existing MIAs: prior work assumes the attacker has access to ground-truth captions, which is unrealistic because:
- Artists suspecting their work was copied typically cannot obtain the training captions.
- Public generative AI platforms do not disclose training data sources.
When ground-truth captions are replaced with VLM-generated alternatives, the performance of SOTA methods degrades significantly.

Method¶

Core Observation¶

Member and non-member samples exhibit systematic differences in sensitivity to conditioning mismatch: - Member samples show a significant increase in \(\mathcal{L}_{\text{cond}}\) under surrogate conditions. - Non-member samples show little change. - \(\mathcal{L}_{\text{uncond}}\) remains stable for both groups.

MoFit Two-Stage Framework¶

Stage 1: Model-Fitted Surrogate Optimization

A surrogate image \(x_0^* = x_0 + \delta^*\) is constructed to overfit the unconditional prior of the target model:

\[\delta^* = \arg\min_\delta \mathbb{E}_{z_0', t, \hat{\epsilon}} [\|\hat{\epsilon} - \epsilon_\theta(z_t', t, \phi_{\text{null}})\|^2]\]

\(\hat{\epsilon}\) and \(t\) are fixed to stabilize the perturbation direction, and \(\delta\) is iteratively updated along the gradient sign direction.

Stage 2: Surrogate-Driven Embedding Extraction

An embedding \(\phi^*\) is optimized from the surrogate image \(x_0^*\):

\[\phi^* = \arg\min_\phi \mathbb{E}_{z_0^*, t, \hat{\epsilon}} [\|\hat{\epsilon} - \epsilon_\theta(z_t^*, t, \phi)\|^2]\]

The VLM-generated caption embedding is used as initialization.

Stage 3: Membership Inference

The model-fitted embedding \(\phi^*\) is used to condition the original query \(x_0\):

\[\mathcal{L}_{\text{MoFit}} = \mathbb{E}[\|\hat{\epsilon} - \epsilon_\theta(z_t, t, \phi^*)\|^2] - \mathbb{E}[\|\hat{\epsilon} - \epsilon_\theta(z_t, t, \phi_{\text{null}})\|^2]\]

The final decision fuses the MoFit score with auxiliary losses (\(\mathcal{L}_{\text{uncond}}\) or \(\mathcal{L}_{\text{VLM}}\)).

Key Experimental Results¶

MIA Performance Comparison in the Caption-Free Setting¶

Method	Condition	Pokemon ASR	Pokemon TPR@1%FPR	MS-COCO ASR	MS-COCO TPR@1%FPR
CLiD	GT	96.52	90.14	86.50	68.80
CLiD	VLM	77.55	19.23	80.90	50.80
PFAMI	VLM	74.43	6.01	80.40	29.40
SecMI	VLM	78.51	6.97	57.30	4.20
MoFit	*\(\phi^\)**	94.48	50.48	88.00	47.00

Ablation Study: Surrogate Image Variants¶

Input	Condition	Pokemon ASR	MS-COCO ASR	MS-COCO TPR@1%FPR
\(x_0\) (original)	\(\phi\)	75.63	78.00	31.00
\(x_0 + \delta\) (random noise)	\(\phi\)	93.99	81.70	29.20
\(x_0 + \delta_{\text{MAX}}\) (reverse optimization)	\(\phi\)	75.87	78.00	34.00
MoFit (\(x_0 + \delta^*\))	*\(\phi^\)**	94.48	88.00	47.00

Key Findings¶

MoFit substantially outperforms VLM-conditioned baselines in the caption-free setting (up to +25% ASR and +30–47% TPR@1%FPR).
On MS-COCO, MoFit even surpasses CLiD with ground-truth captions (ASR: 88.00 vs. 86.50).
Surrogate optimization is critical: using the original image alone or random noise for embedding optimization yields significantly worse results.
MoFit remains effective on SD v1.5 pretrained models (ASR: 77.61), demonstrating generalizability.

Highlights & Insights¶

Practical relevance of problem formulation: The caption-free MIA setting more closely reflects real-world auditing needs.
Deep theoretical insight: The asymmetric sensitivity of member samples to conditioning mismatch provides a novel and exploitable signal.
Elegant two-stage design: Constructing an overfitted surrogate before extracting embeddings yields a tightly coupled model-fitted pair.
No additional data or models required: Only inference-level access to the target model's denoising network is needed.

Limitations & Future Work¶

Requires access to the target model's denoising network parameters (gray-box assumption).
Surrogate optimization and embedding extraction introduce additional computational overhead.
The fixed timestep \(t=140\) is a hyperparameter that may require tuning for different models.
Effectiveness is relatively reduced on LAION-scale pretrained models, a setting where all methods perform poorly.

Diffusion model MIA: SecMI, PIA, PFAMI, CLiD
LLM MIA: Shokri et al. (2017)
Caption-free generation: Classifier-free guidance (Ho & Salimans, 2022)

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — First MIA framework for diffusion models targeting the caption-free setting.
Technical Depth: ⭐⭐⭐⭐ — Core observation is insightful; two-stage optimization design is well-motivated.
Experimental Thoroughness: ⭐⭐⭐⭐ — Multiple datasets, multiple models, and thorough ablations.
Value: ⭐⭐⭐⭐ — Provides a practical tool for data privacy auditing.