The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text¶

Conference: ICML 2025
arXiv: 2502.14921
Code: No public code
Area: AI Safety / Privacy and Security
Keywords: Membership Inference Attack, Synthetic Data Privacy, LLM Privacy Auditing, Canary Attack, Differential Privacy

TL;DR¶

This paper designs Membership Inference Attacks (MIAs) targeting synthetic data generated by LLMs, revealing that synthetic data leaks training data information. Furthermore, it is discovered that model-level canaries perform poorly in scenarios where only synthetic data is released. Consequently, a novel canary design leveraging the properties of autoregressive models is proposed—incorporating an in-distribution prefix and a high-perplexity suffix—to leave detectable traces in the synthetic data, significantly enhancing privacy auditing capabilities.

Background & Motivation¶

Background: Synthetic data is widely regarded as a privacy-preserving countermeasure, where synthetic data generated by LLMs is released instead of real data to utilize data value while protecting individual privacy. Many organizations and enterprises have begun adopting LLM-generated synthetic text as a "privacy-secure" data sharing solution.

Limitations of Prior Work: - There is a widespread assumption that synthetic data is naturally privacy-preserving (i.e., "LLM-generated text does not contain the original training data"). - However, this assumption overlooks the nuances of information flow: LLMs learn statistical patterns and even memorize specific details from the training data during training. - Existing privacy auditing methods (such as Membership Inference Attacks, or MIAs) predominantly target scenarios with direct model access, leaving the "release-only synthetic data" scenario understudied.

Key Challenge: Synthetic data appears to be indirect and "filtered" by the model. However, how much training data information actually "seeps" into the synthetic output through the model? Assuming synthetic data is secure without rigorous auditing can lead to a false sense of security.

Goal: Quantify and audit the privacy leakage risks of training data via LLM-generated synthetic texts, specifically: - Does synthetic data leak membership information of the training set? - How can this leakage be audited more effectively? - Are existing privacy auditing tools effective in synthetic data scenarios?

Key Insight: - First, construct a data-based MIA directly targeting synthetic data to prove that synthetic data indeed leaks training information. - Next, examine traditional canary methods, revealing that canaries designed for model-level MIA fail severely in the synthetic data release scenario. - Finally, leverage the generation mechanism of autoregressive models to design a novel canary that enhances auditing capabilities.

Core Idea: Traditional canaries are highly out-of-distribution (OOD); while they are memorized by the model, they do not affect the model's behavior in generating in-distribution text. The proposed novel canary features an in-distribution prefix and a high-perplexity suffix, exploiting the prefix-conditioning characteristics of autoregressive generation to force the LLM to "leak" the suffix information when generating text similar to the canary prefix.

Method¶

Overall Architecture¶

Training Data D (with/without canary c)
    ↓
LLM Fine-tuning
    ↓
Generate Synthetic Data S
    ↓
Attacker can only access S → Infer membership information of D

                    Attack Route 1: Data-based MIA
                    Attack Route 2: Canary-based MIA (Traditional Canary → Fails)
                    Attack Route 3: Canary-based MIA (Novel Canary → Effective)

Key Designs¶

Data-based Membership Inference Attack:
- Function: Given the synthetic dataset \(S\) and a target sample \(z\), determine whether \(z\) is a member of the training data \(D\).
- Mechanism: Compare the statistical similarity between the synthetic data and the target sample. If \(z \in D\), the LLM has learned the patterns of \(z\) during training, causing the generated synthetic data \(S\) to be statistically closer to \(z\). Specifically:
  - Use a reference model to compute the conditional probability or similarity score of \(z\) with respect to \(S\).
  - Construct a binary classifier to distinguish between member and non-member.
- Design Motivation: Direct verification of the core question regarding "whether synthetic data leaks." This is the first step—proving that the threat indeed exists.
Failure Analysis of Traditional Canaries:
- Function: Analyze why canaries designed for model-level MIA (where the model can be directly queried) fail in scenarios where only synthetic data is available.
- Key Findings: Traditional canaries (e.g., random strings, highly out-of-distribution samples) have extremely high perplexity. Once memorized, the model's perplexity on these canaries decreases. However, this memorization does not significantly alter the model's generation behavior of useful, in-distribution synthetic data under normal prompts.
- Reason Analysis: The generation process of autoregressive models is conditional—\(P(y_t | y_{<t}, \text{prompt})\). Since the prefix of an OOD canary differs from normal prompts or the generation distribution, the model rarely "triggers" the memory path of the OOD canary during synthetic data generation.
- Design Motivation: To reveal the blind spots of existing auditing tools—relying solely on traditional canaries for privacy auditing severely underestimates the leakage risk of synthetic data.
Novel Canary Design (In-Distribution Prefix + High-Perplexity Suffix):
- Function: Design a canary \(c = [c_{\text{prefix}}, c_{\text{suffix}}]\) that leaves detectable traces in the synthetic data.
- Mechanism:
  - Prefix \(c_{\text{prefix}}\): Normal text sampled from the training data distribution. This ensures the model has a reasonable probability of generating text starting with a similar prefix during synthetic data generation.
  - Suffix \(c_{\text{suffix}}\): Carefully constructed high-perplexity text (e.g., random token sequences or adversarially constructed strings). After being memorized by the model, when the context of the generation process matches the prefix, the conditional probability distribution undergoes a detectable shift.
  - Detection Mechanism: Search for texts in the synthetic data that are similar to the canary prefix, and inspect whether the distribution of subsequent tokens is anomalous (i.e., shifting towards the canary suffix).
- Formalization: For an autoregressive model, \(P(y_{t+1} | y_{\leq t}) \propto \exp\left(\log P_\theta(y_{t+1} | y_{\leq t})\right)\) If \(y_{\leq t} \approx c_{\text{prefix}}\) and the model has memorized \(c = [c_{\text{prefix}}, c_{\text{suffix}}]\), then \(P_\theta(c_{\text{suffix}} | c_{\text{prefix}})\) will increase significantly compared to the untrained setting. This increase leaves an "echo" in the synthetic data in a subtle yet statistically detectable manner.
- Design Motivation: To exploit the core property of autoregressive models—context conditioning. The in-distribution prefix ensures the canary can be "triggered" through the normal generation process; the high-perplexity suffix ensures that only a model trained on the canary produces the relevant signal, as an untrained model is highly unlikely to generate such a suffix.

Loss & Training¶

This is an auditing framework rather than a method that requires training an attack model.
LLM fine-tuning employs the standard autoregressive loss: \(\mathcal{L} = -\sum_t \log P_\theta(x_t | x_{<t})\)
The canary participates in fine-tuning as a part of the training data.
Auditing is performed via statistical tests to determine membership.

Key Experimental Results¶

Main Results: MIA Performance (TPR @ low FPR)¶

Attack Method	TPR@1%FPR	TPR@5%FPR	Scenario
Random Guess	1.0%	5.0%	Baseline
Data-based MIA (No Canary)	~8%	~18%	Synthetic data access only
Traditional Canary (OOD) + Data-based MIA	~3%	~9%	Canary is barely effective
Novel Canary (Ours) + Data-based MIA	~25%	~45%	Significant improvement
Model-level MIA (With Model Access)	~35%	~55%	Upper bound reference

Ablation Study¶

Canary Configuration	TPR@1%FPR	Description
Full OOD Canary (Traditional)	~3%	Does not affect in-distribution generation
Full In-Distribution Canary	~5%	No detectable anomalous signals
Random Prefix + High-Perplexity Suffix	~12%	Low prefix matching probability
In-Distribution Prefix + High-Perplexity Suffix (Ours)	~25%	Optimal design
Varying Suffix Perplexity (Low)	~10%	Suffix is too common, weak signal
Varying Suffix Perplexity (High)	~25%	High perplexity = Strong signal
Different Prefix Lengths (Short)	~15%	Insufficient conditioning context
Different Prefix Lengths (Long)	~27%	Longer prefix provides more precise triggering

Key Findings¶

Synthetic data indeed leaks training data information: Even without canaries, the TPR of data-based MIA is significantly higher than random.
Traditional canaries almost fail in synthetic data scenarios: OOD canaries are memorized by the model but do not impact the in-distribution generation behavior, meaning that using traditional methods for privacy auditing severely underestimates risk.
The novel canary design drastically improves auditing capability: The TPR@1%FPR increases from 3% to 25%, proving the intuition behind the in-distribution prefix + high-perplexity suffix design is correct.
The in-distribution level of the prefix and the perplexity of the suffix are two critical dimensions: Together, they determine the product of "trigger probability \(\times\) signal strength".

Highlights & Insights¶

Debunking the myth of "synthetic data is safe by default": Quantitative evidence is provided to demonstrate that synthetic data generated by LLMs leaks training membership information.
Revealing blind spots in privacy auditing tools: Traditional canary methods fail in synthetic data scenarios. This represents an important negative result, warning researchers against directly reusing model-level auditing tools.
Elegant attack design leveraging autoregressive model properties: The in-distribution prefix ensures "triggering" while the high-perplexity suffix ensures "detection", making them highly complementary.
The elegant metaphor of "The Canary's Echo": The canary is "ingested" by the model, and its information appears as an "echo" in the synthetic data.
Implications for Differential Privacy: Relying on synthetic data alone as a privacy measure is insufficient without formal privacy guarantees like DP.

Limitations & Future Work¶

Canaries must be injected prior to training: This is an auditing method rather than an adversarial attack—it requires coordination with the data owner.
Imprecise quantification of leakage: While MIA success rates provide qualitative evidence, it remains difficult to precisely quantify "how many bits of information were leaked."
Exclusively validated on textual data: The applicability to other synthetic data modalities, such as code and tabular data, has not yet been explored.
Integration with DP: Further experiments are needed to test whether leakage can be effectively suppressed under differentially private training.
Scalability to large-scale LLMs: Performance on larger models (e.g., 70B+) remains to be verified.

Membership Inference Attacks (MIA): Shokri et al. (2017) pioneered the field of MIA. This paper extends MIA to the "indirect access" scenario, where the attacker can only observe the synthetic data.
Canary Methods: Carlini et al. (2019) proposed using canaries to audit model memorization. This paper reveals its limitations in synthetic data scenarios and proposes improvements.
Synthetic Data Privacy: Jordon et al. (2022) studied privacy in tabular synthetic data. This paper focuses on the more recent paradigm of LLM-based text synthesis.
Insights:
- Privacy guarantees cannot be implicitly obtained via "indirect release"—formal privacy mechanisms (such as DP) are required.
- The conditioning property of autoregressive models is simultaneously a source of their strength and a channel for privacy leakage.
- Privacy auditing tools need to be custom-designed for different release scenarios (direct model release vs. synthetic data release).

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First systematic study on privacy leakage in LLM synthetic data, with a highly ingenious novel canary design.
Experimental Thoroughness: ⭐⭐⭐⭐ Multiple MIA variants, thorough ablations, and comparisons against traditional methods.
Writing Quality: ⭐⭐⭐⭐⭐ Vivid metaphor of "The Canary's Echo", with clear problem motivation and logical flow of findings.
Value: ⭐⭐⭐⭐⭐ Highly significant warning for the synthetic data privacy field, offering a practical auditing tool.