Scalable Fingerprinting of Large Language Models¶

Conference: NeurIPS 2025 arXiv: 2502.07760 Code: GitHub Area: LLM Pretraining Keywords: model fingerprinting, LLM ownership, Perinucleus sampling, collusion attack, model security

TL;DR¶

This paper proposes Perinucleus sampling to generate scalable LLM fingerprints, enabling the embedding of 24,576 fingerprints in Llama-3.1-8B—two orders of magnitude more than existing methods—without degrading model capability. Theoretical and empirical analyses demonstrate that large-scale fingerprinting is essential for defending against collusion attacks.

Background & Motivation¶

Root Cause¶

Key Challenge: Background: Need for Model Fingerprinting: Model fingerprinting enables owners to identify unauthorized use of their models via API access.

Why Scalability Matters: Reduces false positive rates, mitigates fingerprint leakage (one fingerprint is exposed per verification query), and defends against collusion attacks (multiple users jointly circumventing fingerprints).

Limitations of Prior Work: RANDOM (random token keys) is scalable but insecure; ENGLISH-RANDOM (natural language keys with random responses) suffers severe performance degradation beyond 256 fingerprints.

Method¶

Overall Architecture¶

The fingerprinting system consists of two components: fingerprint generation and fingerprint training.

Fingerprint Generation — Perinucleus Sampling¶

Key Generation: Natural language questions are sampled at low temperature, making them indistinguishable from normal queries (in-distribution).

Response Generation: "Plausible but uncommon" responses are sampled near the boundary of the base model's nucleus distribution: 1. Compute the next-token probability distribution. 2. Identify the top-\(t\) percentile nucleus boundary. 3. Uniformly sample from the \(k\) tokens immediately outside the nucleus.

Parameter settings: \(t=0.8\) (actual average probability ≈ 0.014), \(k=3\).

Theoretical FPR Guarantee: \(\text{FPR} \leq \exp(-2M(1-1/k)^2)\), decreasing exponentially with the number of fingerprints.

Fingerprint Training¶

Weight Bias Penalty: After each update step, the model is interpolated with the original model via a weighted average (\(\lambda_{WA}=0.75\)).
Data Mixing: Fingerprint data is mixed with data generated by the base model (\(\beta_{DM}=0.25\)).

Collusion Attack Defense¶

Each fingerprint is randomly assigned to each model with probability \(p\), and a candidate score is tracked during detection. Theoretical guarantee: \(M = O(2^K K^{K+1} \log(N/\delta))\) fingerprints suffice to identify at least one colluding model with probability \(1-\delta\).

Key Experimental Results¶

Scalability¶

Main Results¶

# Fingerprints	Perinucleus (OpenLLM)	ENGLISH-RANDOM	Retention
256	~63%	~61%	>99%
1024	~62.5%	~57%	~98%
8192	~61.5%	Collapse	~96%
24576	~61%	N/A	~95%

Persistence (Post-SFT)¶

Ablation Study¶

Method	1024 Persistence	8192 Persistence
RANDOM	~85%	~65%
Perinucleus	~80%	~60%
ENGLISH-RANDOM	~40%	<20%

Cross-Model Generalizability¶

Relative performance exceeds 95% across 10 models with 8,192 fingerprints.

Key Findings¶

The impact of increasing SFT sample count on persistence is approximately log-linear.
Mathematical data causes less fingerprint forgetting than conversational data.
DPO training does not significantly exacerbate fingerprint forgetting.
Inspecting 5 fingerprints yields satisfactory false positive/negative rates.

Highlights & Insights¶

Perinucleus Sampling: An elegant design that samples near the boundary of the nucleus distribution, reducing model distortion during training.
Scalability as a Security Property: This work is the first to elevate scalability as a core criterion and theoretically demonstrate its necessity.
Regularization and Fingerprint Design Are Orthogonal: Ablations confirm that both contribute independently.
Simple and Effective Collusion Defense: Random assignment combined with \(O(\log N)\) fingerprints suffices.

Limitations & Future Work¶

Primary experiments use single-token responses; multi-token scenarios warrant further investigation.
Combined attacks involving fine-tuning and collusion are not thoroughly evaluated.
The impact of model merging attacks requires deeper study.
Different inference sampling strategies may affect detection reliability.

Xu et al. / Russinovich & Salem: Focus on harmlessness and persistence while neglecting scalability.
Model Watermarking: Detects whether text is LLM-generated, whereas fingerprinting verifies ownership of a specific model.
Insights: The Perinucleus idea is generalizable to any scenario requiring the embedding of covert information.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Perinucleus sampling is original and elegant; the scalability perspective is entirely new.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers 10 models, multiple attack types, theoretical guarantees, and comprehensive parameter analysis.
Writing Quality: ⭐⭐⭐⭐⭐ Problem formulation is clear with multi-dimensional analysis.
Value: ⭐⭐⭐⭐ Valuable for model security and intellectual property protection, though the application scope is relatively narrow.