Skip to content

Scalable Fingerprinting of Large Language Models

Conference: NeurIPS 2025 arXiv: 2502.07760 Code: GitHub Area: LLM Pretraining Keywords: model fingerprinting, LLM ownership, Perinucleus sampling, collusion attack, model security

TL;DR

This paper proposes Perinucleus sampling to generate scalable LLM fingerprints, enabling the embedding of 24,576 fingerprints in Llama-3.1-8B—two orders of magnitude more than existing methods—without degrading model capability. Theoretical and empirical analyses demonstrate that large-scale fingerprinting is essential for defending against collusion attacks.

Background & Motivation

Root Cause

Key Challenge: Background: Need for Model Fingerprinting: Model fingerprinting enables owners to identify unauthorized use of their models via API access.

Why Scalability Matters: Reduces false positive rates, mitigates fingerprint leakage (one fingerprint is exposed per verification query), and defends against collusion attacks (multiple users jointly circumventing fingerprints).

Limitations of Prior Work: RANDOM (random token keys) is scalable but insecure; ENGLISH-RANDOM (natural language keys with random responses) suffers severe performance degradation beyond 256 fingerprints.

Method

Overall Architecture

The fingerprinting system consists of two components: fingerprint generation and fingerprint training.

Fingerprint Generation — Perinucleus Sampling

Key Generation: Natural language questions are sampled at low temperature, making them indistinguishable from normal queries (in-distribution).

Response Generation: "Plausible but uncommon" responses are sampled near the boundary of the base model's nucleus distribution: 1. Compute the next-token probability distribution. 2. Identify the top-\(t\) percentile nucleus boundary. 3. Uniformly sample from the \(k\) tokens immediately outside the nucleus.

Parameter settings: \(t=0.8\) (actual average probability ≈ 0.014), \(k=3\).

Theoretical FPR Guarantee: \(\text{FPR} \leq \exp(-2M(1-1/k)^2)\), decreasing exponentially with the number of fingerprints.

Fingerprint Training

  1. Weight Bias Penalty: After each update step, the model is interpolated with the original model via a weighted average (\(\lambda_{WA}=0.75\)).
  2. Data Mixing: Fingerprint data is mixed with data generated by the base model (\(\beta_{DM}=0.25\)).

Collusion Attack Defense

Each fingerprint is randomly assigned to each model with probability \(p\), and a candidate score is tracked during detection. Theoretical guarantee: \(M = O(2^K K^{K+1} \log(N/\delta))\) fingerprints suffice to identify at least one colluding model with probability \(1-\delta\).

Key Experimental Results

Scalability

Main Results

# Fingerprints Perinucleus (OpenLLM) ENGLISH-RANDOM Retention
256 ~63% ~61% >99%
1024 ~62.5% ~57% ~98%
8192 ~61.5% Collapse ~96%
24576 ~61% N/A ~95%

Persistence (Post-SFT)

Ablation Study

Method 1024 Persistence 8192 Persistence
RANDOM ~85% ~65%
Perinucleus ~80% ~60%
ENGLISH-RANDOM ~40% <20%

Cross-Model Generalizability

Relative performance exceeds 95% across 10 models with 8,192 fingerprints.

Key Findings

  1. The impact of increasing SFT sample count on persistence is approximately log-linear.
  2. Mathematical data causes less fingerprint forgetting than conversational data.
  3. DPO training does not significantly exacerbate fingerprint forgetting.
  4. Inspecting 5 fingerprints yields satisfactory false positive/negative rates.

Highlights & Insights

  1. Perinucleus Sampling: An elegant design that samples near the boundary of the nucleus distribution, reducing model distortion during training.
  2. Scalability as a Security Property: This work is the first to elevate scalability as a core criterion and theoretically demonstrate its necessity.
  3. Regularization and Fingerprint Design Are Orthogonal: Ablations confirm that both contribute independently.
  4. Simple and Effective Collusion Defense: Random assignment combined with \(O(\log N)\) fingerprints suffices.

Limitations & Future Work

  1. Primary experiments use single-token responses; multi-token scenarios warrant further investigation.
  2. Combined attacks involving fine-tuning and collusion are not thoroughly evaluated.
  3. The impact of model merging attacks requires deeper study.
  4. Different inference sampling strategies may affect detection reliability.
  • Xu et al. / Russinovich & Salem: Focus on harmlessness and persistence while neglecting scalability.
  • Model Watermarking: Detects whether text is LLM-generated, whereas fingerprinting verifies ownership of a specific model.
  • Insights: The Perinucleus idea is generalizable to any scenario requiring the embedding of covert information.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Perinucleus sampling is original and elegant; the scalability perspective is entirely new.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers 10 models, multiple attack types, theoretical guarantees, and comprehensive parameter analysis.
  • Writing Quality: ⭐⭐⭐⭐⭐ Problem formulation is clear with multi-dimensional analysis.
  • Value: ⭐⭐⭐⭐ Valuable for model security and intellectual property protection, though the application scope is relatively narrow.