Rethinking Data Bias: Dataset Copyright Protection via Embedding Class-Wise Hidden Bias¶

Conference: ECCV 2024
Code: https://github.com/jjh6297/UndercoverBias
Area: Others
Keywords: Dataset Watermarking, Copyright Protection, Undercover Bias, Model Forensics, Intellectual Property

TL;DR¶

This paper proposes "Undercover Bias", a dataset watermarking method. By embedding hidden watermark patterns that are irrelevant to the target task but correspond to the labels into the training data, models trained by unauthorized users unconsciously learn to classify these watermarks. The capability to classify the watermarks serves as irrefutable evidence of unauthorized use, achieving dataset copyright protection that is covert, model-agnostic, and non-destructive to the target task.

Background & Motivation¶

Background: Public datasets are crucial assets for data-driven AI development, but they face severe risks of illegal usage—commercial companies might use research-purpose public datasets to train commercial models without authorization. Dataset copyright protection aims to reliably identify and verify such unauthorized usage.

Limitations of Prior Work: Existing dataset protection methods suffer from several limitations: (1) visible watermarking-based methods are easy to detect and remove; (2) backdoor attack-based methods can be detected by backdoor detection tools; (3) some methods require knowledge of the architecture or parameters of the suspect model, lacking model-agnosticism; (4) watermarks may significantly degrade the performance of the target task, reducing dataset utility; (5) verification evidence is weak—legitimate training might accidentally exhibit similar behaviors, lacking non-repudiation.

Key Challenge: Dataset protection needs to simultaneously satisfy four seemingly contradictory requirements: watermarks must be covert enough to escape detection, maintain original task performance, be learnable by the model to serve as evidence, and produce evidence that cannot be generated by legitimate training by chance. Traditional methods find it difficult to satisfy all these requirements.

Goal: (1) How to embed imperceptible watermarks in datasets; (2) how to ensure that watermarks do not degrade target task performance; (3) how to generate non-repudiable evidence of unauthorized use; (4) how to adapt to arbitrary model architectures.

Key Insight: The authors observe a significant phenomenon—deep learning models often unintentionally learn biases in data, sometimes making classification decisions relying solely on bias features even when they are irrelevant to the task. Leveraging this characteristic, one can intentionally inject "undercover biases" that are associated with labels but irrelevant to the target task. During training, the model naturally learns to exploit these biases, establishing solid evidence of unauthorized use.

Core Idea: Transforming the property of "models being highly susceptible to learning data biases", which is typically viewed as a defect, into a tool for dataset protection—intentionally embedding class-wise undercover biases as watermarks.

Method¶

Overall Architecture¶

The workflow of Undercover Bias consists of three phases: (1) Watermark embedding: designing a unique watermark pattern for each class in the dataset and embedding it into the training images of that class in an imperceptible manner; (2) Dataset distribution: releasing the watermarked dataset publicly, which appears and functions identically to the original dataset; (3) Copyright verification: when a suspect model is identified, it is tested with a watermark verification set (images containing only watermark patterns without any target task information). If the model correctly classifies the watermarks, it proves the model was trained on the watermarked dataset.

Key Designs¶

Class-wise Hidden Watermarks:
- Function: Creating unique, imperceptible watermark signals for each class.
- Mechanism: Each class \(c\) corresponds to a unique watermark pattern \(w_c\). The watermark is superimposed onto the training images of this class with an extremely low intensity \(\alpha\) (close to the human perception threshold): \(x'= x + \alpha \cdot w_c\). The watermark pattern itself is entirely irrelevant to the target task (e.g., specific frequency patterns or random textures), but because it corresponds one-to-one with the class labels, the model automatically associates the watermark features with the classes during training. The watermark design must satisfy high discriminability among different classes, low visibility, and minimal impact on image quality.
- Design Motivation: Leveraging the sensitivity of deep learning models to data biases—even when target task-relevant features are more prominent, the model still captures and memorizes these secondary watermark features due to their perfect association with the labels.
Bias-Only Verification:
- Function: Providing non-repudiable evidence of unauthorized usage.
- Mechanism: Building a verification set containing only watermark patterns—where images only feature watermark patterns without any visual information related to the target task. Testing the suspect model with this verification set: if the model has never seen the watermarked dataset, it cannot achieve classification accuracy above random chance on this bias-only verification set; if the model was trained on the watermarked dataset, it will naturally learn the mapping between watermarks and classes, showing a classification accuracy significantly higher than random chance. This evidence cannot occur by chance in legitimate training, providing statistically non-repudiable proof.
- Design Motivation: The "trigger-behavior" relationship in backdoor attack methods can be detected and eliminated by backdoor detection tools. In contrast, the "watermark-classification" relationship in the undercover bias method is a natural result of model learning, which cannot be removed by backdoor defense mechanisms.
Robustness Enhancement:
- Function: Enhancing the robustness of the watermark under various potential attacks.
- Mechanism: (1) Adaptive watermark intensity: dynamically adjusting \(\alpha\) based on the texture complexity of the image, allowing higher watermark intensity in regions with rich textures without perception; (2) Data augmentation robustness: ensuring watermark patterns can still be learned by the model after common data augmentations (e.g., cropping, flipping, color jittering); (3) Multi-band embedding: dispersing the watermark across multiple frequency bands of the image to enhance robustness against image compression and filtering attacks.
- Design Motivation: In practical scenarios, training suspect models could involve various data preprocessing and augmentation strategies; hence, the watermark must remain effective under these transformations.

Loss & Training¶

The watermark embedding phase does not involve model training; it is a purely data-level operation. During the verification phase, the watermark classification accuracy of the suspect model is evaluated, and statistical hypothesis testing (such as p-value test) is used to determine whether it is significantly higher than the random baseline, acting as evidence of usage. This method is compatible with arbitrary model architectures (CNN, ViT, etc.) and any training strategies.

Key Experimental Results¶

Main Results¶

Dataset	Target Task Accuracy (Original)	Target Task Accuracy (Watermarked)	Watermark Verification Accuracy	Random Baseline
CIFAR-10	94.2%	93.8% (-0.4%)	87.5%	10%
CIFAR-100	76.5%	75.9% (-0.6%)	62.3%	1%
ImageNet subset	78.3%	77.8% (-0.5%)	71.2%	0.1%
Segmentation	Original mIoU	~Original (-0.5%)	Above Random	Random

Ablation Study¶

Configuration	Target Task Accuracy	Watermark Verification Accuracy	Description
No-watermark Baseline	94.2%	10.1%	Close to random
\(\alpha\) = 0.01	94.1%	52.3%	Weak watermark, moderate verification accuracy
\(\alpha\) = 0.05	93.8%	87.5%	Balanced point
\(\alpha\) = 0.10	93.5%	93.2%	Strong watermark, minor impact on task performance

Key Findings¶

The watermark has a minimal impact on target task performance (< 1%), while the watermark verification accuracy is significantly higher than the random baseline (by dozens of times).
The method is effective across different model architectures (ResNet, VGG, DenseNet, ViT), confirming model agnosticism.
Watermarked signals can still be learned after undergoing standard data augmentations, though extreme compression (such as JPEG quality < 20) might degrade watermark readability.
The effectiveness is preserved when extended to fine-grained classification and image segmentation tasks, demonstrating the generalization capability of the method.

Highlights & Insights¶

An Ingenious Perspective of Turning Defect into Feature: High susceptibility of deep learning models to learning data biases is typically considered a vulnerability regarding model robustness, but this paper turns it into a dataset protection tool. This "fight fire with fire" paradigm is highly inspiring, suggesting that many other "defects" could potentially be exploited positively.
Design of Non-Repudiability: The bias-only verification set contains only watermark patterns, completely free of target task information. Models trained legitimately cannot achieve high scores on such test sets, providing strong statistical evidence of usage. This is cleaner and more irrefutable than the "trigger + target class" verification in backdoor attacks.
Transferability to Data Provenance: This method is applicable not only to copyright protection but also to data lineage tracking—by employing different watermark patterns across different distribution channels, sources of leakage can be positioned.

Limitations & Future Work¶

There exists a trade-off between watermark strength and invisibility; excessively low strength might lead to unreliable verification.
If adversaries are aware of the specifics of the watermarking method, they might design targeted pre-processing to remove watermarks.
Embedding invisible watermarks is easier in high-resolution datasets (owing to more pixels to accommodate information), but the watermark capacity is constrained in low-resolution datasets (e.g., 32×32 CIFAR).
The robustness of the watermark in indirect usage scenarios such as model distillation and transfer learning has not been investigated.
Future research can explore embedding watermarks in the feature space rather than the pixel space to enhance robustness against image-level attacks.

vs Backdoor Attacks (BadNets): Backdoor attacks insert trigger patterns into the inputs to coerce the model into outputting a specific target class, which can be detected and eliminated by backdoor detection tools (e.g., Neural Cleanse, STRIP). The watermark of Undercover Bias does not alter the normal behavior of the model, adding only harmless bias learning that backdoor detection tools cannot recognize.
vs RadioactiveData: This method modifies the feature space to produce a statistical shift in model parameters, requiring white-box access to model parameters for verification. Undercover Bias only requires black-box inference to verify, making it more practical.
vs Dataset Fingerprints: Fingerprinting methods typically require modifying labels of a small subset of samples in the dataset, which may compromise data quality. In contrast, this method does not alter labels and performs only imperceptible modifications at the pixel level.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ "The idea of reversing the defect of learning data biases into a data protection tool is highly novel, and the class-wise undercover bias design is ingenious."
Experimental Thoroughness: ⭐⭐⭐⭐ "Sufficient evaluations on multiple datasets and model architectures, extension to segmentation tasks, and watermark intensity analysis."
Writing Quality: ⭐⭐⭐⭐ "Clear motivation and explicit comparisons demonstrating multiple advantages."
Value: ⭐⭐⭐⭐ "Dataset copyright protection is a practical yet overlooked issue, and this paper provides a highly viable solution."