Scalable Fingerprinting of Large Language Models¶
Conference: NeurIPS 2025 arXiv: 2502.07760 Code: GitHub Area: LLM Pretraining Keywords: model fingerprinting, LLM ownership, Perinucleus sampling, collusion attack, model security
TL;DR¶
This paper proposes Perinucleus sampling to generate scalable LLM fingerprints, enabling the embedding of 24,576 fingerprints in Llama-3.1-8B—two orders of magnitude more than existing methods—without degrading model capability. Theoretical and empirical analyses demonstrate that large-scale fingerprinting is essential for defending against collusion attacks.
Background & Motivation¶
Root Cause¶
Key Challenge: Background: Need for Model Fingerprinting: Model fingerprinting enables owners to identify unauthorized use of their models via API access.
Why Scalability Matters: Reduces false positive rates, mitigates fingerprint leakage (one fingerprint is exposed per verification query), and defends against collusion attacks (multiple users jointly circumventing fingerprints).
Limitations of Prior Work: RANDOM (random token keys) is scalable but insecure; ENGLISH-RANDOM (natural language keys with random responses) suffers severe performance degradation beyond 256 fingerprints.
Method¶
Overall Architecture¶
The fingerprinting system consists of two components: fingerprint generation and fingerprint training.
Fingerprint Generation — Perinucleus Sampling¶
Key Generation: Natural language questions are sampled at low temperature, making them indistinguishable from normal queries (in-distribution).
Response Generation: "Plausible but uncommon" responses are sampled near the boundary of the base model's nucleus distribution: 1. Compute the next-token probability distribution. 2. Identify the top-\(t\) percentile nucleus boundary. 3. Uniformly sample from the \(k\) tokens immediately outside the nucleus.
Parameter settings: \(t=0.8\) (actual average probability ≈ 0.014), \(k=3\).
Theoretical FPR Guarantee: \(\text{FPR} \leq \exp(-2M(1-1/k)^2)\), decreasing exponentially with the number of fingerprints.
Fingerprint Training¶
- Weight Bias Penalty: After each update step, the model is interpolated with the original model via a weighted average (\(\lambda_{WA}=0.75\)).
- Data Mixing: Fingerprint data is mixed with data generated by the base model (\(\beta_{DM}=0.25\)).
Collusion Attack Defense¶
Each fingerprint is randomly assigned to each model with probability \(p\), and a candidate score is tracked during detection. Theoretical guarantee: \(M = O(2^K K^{K+1} \log(N/\delta))\) fingerprints suffice to identify at least one colluding model with probability \(1-\delta\).
Key Experimental Results¶
Scalability¶
Main Results¶
| # Fingerprints | Perinucleus (OpenLLM) | ENGLISH-RANDOM | Retention |
|---|---|---|---|
| 256 | ~63% | ~61% | >99% |
| 1024 | ~62.5% | ~57% | ~98% |
| 8192 | ~61.5% | Collapse | ~96% |
| 24576 | ~61% | N/A | ~95% |
Persistence (Post-SFT)¶
Ablation Study¶
| Method | 1024 Persistence | 8192 Persistence |
|---|---|---|
| RANDOM | ~85% | ~65% |
| Perinucleus | ~80% | ~60% |
| ENGLISH-RANDOM | ~40% | <20% |
Cross-Model Generalizability¶
Relative performance exceeds 95% across 10 models with 8,192 fingerprints.
Key Findings¶
- The impact of increasing SFT sample count on persistence is approximately log-linear.
- Mathematical data causes less fingerprint forgetting than conversational data.
- DPO training does not significantly exacerbate fingerprint forgetting.
- Inspecting 5 fingerprints yields satisfactory false positive/negative rates.
Highlights & Insights¶
- Perinucleus Sampling: An elegant design that samples near the boundary of the nucleus distribution, reducing model distortion during training.
- Scalability as a Security Property: This work is the first to elevate scalability as a core criterion and theoretically demonstrate its necessity.
- Regularization and Fingerprint Design Are Orthogonal: Ablations confirm that both contribute independently.
- Simple and Effective Collusion Defense: Random assignment combined with \(O(\log N)\) fingerprints suffices.
Limitations & Future Work¶
- Primary experiments use single-token responses; multi-token scenarios warrant further investigation.
- Combined attacks involving fine-tuning and collusion are not thoroughly evaluated.
- The impact of model merging attacks requires deeper study.
- Different inference sampling strategies may affect detection reliability.
Related Work & Insights¶
- Xu et al. / Russinovich & Salem: Focus on harmlessness and persistence while neglecting scalability.
- Model Watermarking: Detects whether text is LLM-generated, whereas fingerprinting verifies ownership of a specific model.
- Insights: The Perinucleus idea is generalizable to any scenario requiring the embedding of covert information.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ Perinucleus sampling is original and elegant; the scalability perspective is entirely new.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers 10 models, multiple attack types, theoretical guarantees, and comprehensive parameter analysis.
- Writing Quality: ⭐⭐⭐⭐⭐ Problem formulation is clear with multi-dimensional analysis.
- Value: ⭐⭐⭐⭐ Valuable for model security and intellectual property protection, though the application scope is relatively narrow.