Skip to content

Few-Shot Learner Generalizes Across AI-Generated Image Detection

Conference: ICML 2025
arXiv: 2501.08763
Code: GitHub
Area: Object Detection
Keywords: AI-generated image detection, few-shot learning, prototypical network, domain generalization, deepfake detection

TL;DR

This paper is the first to redefine AI-generated image detection as a few-shot classification task. It proposes FSD (Few-Shot Detector) based on prototypical networks to learn a metric space. Using only 10 samples from unseen generative models, it achieves an average accuracy of 84.1% on the GenImage dataset, outperforming the previous SOTA (LARE2) by +11.6%.

Background & Motivation

Background: With the rapid development of diffusion models such as Stable Diffusion and Midjourney, generating highly realistic images has become extremely easy. Existing AI-generated image detection methods can be categorized into two paradigms: (1) model-agnostic methods, which detect general artifacts (frequency features, texture patterns, etc.) in synthetic images; (2) diffusion model reconstruction methods, which utilize the reconstruction errors of diffusion models to distinguish real from fake images.

Limitations of Prior Work: Both paradigms face common challenges. General artifacts identified during the CNN era (such as spectral anomalies) are no longer present in images generated by modern diffusion models, leading to a sharp decline in cross-model generalization for model-agnostic methods. Diffusion reconstruction methods (such as DIRE and LARE2) rely on the reconstruction capability of specific diffusion models, showing poor detection performance on models unseen during training (especially those using different architectures like VQDM). More fundamentally, collecting large amounts of training data from closed-source models (such as DALL-E and Midjourney) is expensive or impractical.

Key Challenge: Pursuing a "one-size-fits-all detection metric" (a single classifier to detect all generative models) ignores the fact that artifact features generated by different models vary significantly. However, retraining a classifier for each new model requires vast amounts of data, creating a dilemma of "insufficient generalization vs. excessive data requirements."

Goal: To rapidly adapt the detector using a few (e.g., 10) samples from an unseen generative model to identify its generated images, transforming the impossible "universal detection" problem into a feasible "few-shot adaptation" task.

Key Insight: The authors observe that in practical scenarios, obtaining a small number of samples from unseen models is usually feasible (generating a few images via APIs incurs extremely low cost). The key insight is to treat different generative models as distinct classes rather than grouping them into a generalized "fake" class. This allows the model to learn the specific artifact features associated with each generator, avoiding the pursuit of non-existent "universal artifacts."

Core Idea: To redefine AI-generated image detection as an N-way K-shot classification task, leveraging prototypical networks to achieve training-free rapid adaptation in a metric space.

Method

Overall Architecture

FSD is based on Prototypical Networks. In the training phase, an episode training strategy is utilized on datasets of known generative models to learn a metric space where images of the same class cluster together and different classes are separated. In the testing phase, given a few (K) samples from unseen models as a support set, the prototype representation (mean of embedding vectors) for each class is calculated, and test images are classified based on the nearest neighbor principle.

Key Designs

  1. Multi-class instead of Binary Detection Paradigm:

    • Function: Classifies images from different generative models into separate categories, with real images as a distinct class.
    • Mechanism: Traditional methods formulate detection as a binary classification by grouping all generated images into a single "fake" class. FSD treats each generative model (Midjourney, GLIDE, ADM, Stable Diffusion, etc.) as an independent class. During training, the 8 subsets of the GenImage dataset are divided into 7 classes based on their source (where SD v1.4, v1.5, and Wukong are merged into a single SD class as they share the same architecture). Each episode randomly samples \(N_c\) classes, with \(N_s\) support samples and \(N_q\) query samples per class.
    • Design Motivation: t-SNE visualization experiments clearly demonstrate that multi-class training in FSD forces images from different sources to form distinct clusters in the feature space (including unseen classes), whereas unseen samples are highly scattered in the feature space of binary classifiers. This suggests that multi-class training forces the model to learn intra-class commonalities, which facilitates generalization to unseen classes.
  2. Metric Learning with Prototypical Networks:

    • Function: Learns a metric space in which classification can be completed via simple distance computations.
    • Mechanism: A ResNet-50 (pretrained on ImageNet) is used as the backbone \(f_\phi: \mathbb{R}^D \to \mathbb{R}^{1024}\) to map images into a 1024-dimensional feature space. The prototype representation of each class is computed as \(\mathbf{c}_i = \frac{1}{|S_i|} \sum_{\mathbf{x}_j \in S_i} f_\phi(\mathbf{x}_j)\). Classification probabilities are calculated using the negative squared Euclidean distance normalized by softmax: \(p(y=i|\mathbf{x}_q) = \text{Softmax}_{1 \leq i \leq N}(-d(f_\phi(\mathbf{x}_q), \mathbf{c}_i))\). The training objective is to minimize the negative log-probability of query samples: \(J(\phi) = -\frac{1}{N_c N_q} \sum_k \sum_{\mathbf{x}_j \in Q_k} \log \text{Softmax}(-d(f_\phi(\mathbf{x}_j), \mathbf{c}_k))\).
    • Design Motivation: The mean embedding of the prototypical network naturally suppresses noise from individual samples. As long as the artifacts of the generative model are systematic (e.g., frequency characteristics or texture patterns), the mean of a small number of samples can effectively capture intra-class commonalities.
  3. Metadata Vector for Zero-Shot Detection:

    • Function: Enables detection even when no samples of the test classes are available.
    • Mechanism: In the training set, 1024 samples are randomly selected from each class to compute their prototype representations as "metadata vectors". During testing, if the nearest prototype belongs to any generative model class, the image is identified as fake. This zero-shot approach does not require any samples from unseen classes.
    • Design Motivation: Provides a baseline capability without any adaptation overhead, allowing FSD to function even when samples of new models are completely unavailable (achieving a zero-shot average accuracy of 77.1%).

Loss & Training

Training employs the standard prototypical loss, which is the negative log-softmax of the distance from query samples to their correct prototypes. The optimizer is Adam with a learning rate of \(10^{-4}\) and a StepLR scheduler (\(\gamma=0.5\), step=80000) for 200K steps with a batch size of 16. Each step samples a 3-way 5-shot 5-query episode.

Key Experimental Results

GenImage Benchmark 6-Class Detection Accuracy (%)

Method Midjourney GLIDE ADM SD VQDM BigGAN Average
Spec 50.0 64.7 52.8 56.1 56.5 63.0 57.2
CNNSpot 52.8 73.3 55.0 55.9 54.4 66.2 59.6
DIRE 57.9 68.2 57.3 58.2 59.6 50.8 58.7
LARE2 62.7 80.2 63.5 79.6 76.9 72.0 72.5
FSD (zero-shot) 75.1 93.9 74.1 88.0 69.1 62.1 77.1
FSD (10-shot) 80.9 97.1 79.2 88.8 76.2 82.2 84.1

Cross-generator 10-shot Classification (Accuracy/AP, %)

Excluded Subset Midjourney GLIDE ADM SD VQDM BigGAN
Midjourney 80.9/84.6 99.9/99.9 98.5/99.3 97.1/98.7 99.5/99.9 88.0/92.9
GLIDE 86.8/89.9 97.1/98.0 97.9/98.9 97.1/98.8 99.2/99.7 91.9/97.1
ADM 87.6/91.8 99.8/99.9 79.2/83.8 94.8/97.2 98.8/99.4 91.0/96.1
SD 86.1/89.7 99.9/99.9 97.4/98.8 88.8/92.5 96.6/98.5 89.5/95.4
VQDM 82.4/85.9 99.9/99.9 97.3/98.6 95.6/98.0 76.2/79.4 83.5/89.1
BigGAN 88.9/91.6 99.9/99.9 98.3/99.3 98.1/99.3 96.4/98.3 82.2/86.8

Key Findings

  1. The average 10-shot accuracy of FSD is 84.1%, which substantially outperforms LARE2's 72.5% (+11.6%), while matching this performance with only 10 samples instead of the full training set.
  2. Significant improvements are observed when scaling from 1-shot to 10-shot (e.g., ADM improves from 62.6% to 79.2%, +16.6%), whereas scaling from 10-shot to 200-shot yields only a 2.5% gain, marking 10-shot as the optimal performance-to-cost trade-off.
  3. Zero-shot FSD (77.1%) already outperforms all traditional methods requiring a full training set (LARE2 72.5%), demonstrating that multi-class metric learning itself has learned powerful feature representations.
  4. Diagonal (unseen class) accuracy is generally >76%, while accuracy for classes seen during training is typically >95%, indicating that the generalization gap is present but manageable.
  5. The VQDM class is the most challenging to detect when held out (76.2%), as its image quantization mechanism differs most significantly from other diffusion models.
  6. t-SNE visualization confirms that unseen classes still form tight clusters in the feature space of FSD, whereas they are highly scattered in binary classifiers.

Highlights & Insights

  1. Deep Insight from Problem Redefinition: Shifting the perspective from an "impossible universal detection" to a "feasible few-shot adaptation" is a major contribution in the field.
  2. Simple Method with Remarkable Results: Prototypical network is one of the most basic metric learning approaches, yet it yields outstanding results on this task, showing that problem definition can be more critical than method complexity.
  3. Highly Feasible 10-shot Deployment: Generating 10 images using APIs from Midjourney, DALL-E, etc., is virtually cost-free in practice.
  4. Insights on Multi-class vs. Binary Classification: Multi-class classification forces the model to focus on intra-class commonalities rather than simple inter-class differences, learning more structured feature representations.

Limitations & Future Work

  • Establishing an accurate support set requires knowing or assuming which specific generative model the test image originates from, which degenerates to the zero-shot baseline in completely open-source/unlabeled scenarios.
  • Limited generalization is observed for models with fundamentally different technical paths such as VQDM (76.2% vs. >80% for other classes), indicating that few-shot adaptation remains challenging when model differences are too vast.
  • Prototypical networks compute prototype representations through averaging, which makes them sensitive to outlier samples in the support set.
  • Robustness under adversarial perturbations (deliberate evasion of detection) has not been evaluated.
  • Evaluations are restricted to the GenImage benchmark, lacking cross-validation on additional test sets.
  • FSD is complementary to methods using CLIP semantic features like UnivFD (Ojha et al., 2023)—FSD extracts low-level artifact features based on ResNet.
  • Differing from the reconstruction-based direction of LARE2 (Luo et al., 2024), FSD is fully data-driven and does not rely on any specific diffusion model.
  • Inspiration: The meta-learning framework can be enhanced by substituting the backbone with advanced feature extractors (such as CLIP or DINOv2) to further boost performance.

Rating

⭐⭐⭐⭐⭐ Precise problem redefinition (first to frame AIGI detection as few-shot classification), elegant methodology (prototypical network + multi-class training), extensive experimentation (+11.6% SOTA), and strong practical utility (near-zero cost for 10-shot). The absence of more benchmarks and adversarial robustness evaluation is a minor drawback, but does not detract from the overall high quality of the work.