Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning¶
Conference: CVPR2026 arXiv: 2603.04825 Code: RyanZhaoIc/CAD Area: Others (Weakly Supervised Learning / Partial Label Learning) Keywords: Partial Label Learning, Instance Entanglement, Class-specific Augmentation, Contrastive Learning, Weakly Supervised Classification
TL;DR¶
To address the "instance entanglement" problem in instance-dependent partial label learning (ID-PLL)—where instances from visually similar classes share overlapping features and candidate label sets—this paper proposes the CAD framework, which mitigates class confusion through two complementary mechanisms: intra-class alignment via class-specific augmentation and inter-class separation via a weighted penalty loss.
Background & Motivation¶
Practical demand for partial label learning: Acquiring precise labels in real-world scenarios is costly. Partial Label Learning (PLL) allows each sample to be associated with a set of candidate labels that includes the ground-truth label, enabling low-cost annotation via crowdsourcing or web mining.
Instance-dependent assumption is more realistic: Conventional PLL assumes that candidate labels are generated independently of instance features (random or class-conditional noise). In practice, however, label ambiguity is often determined by instance characteristics—for example, a Silver Fox dog is more likely to be annotated as "fox," whereas a Corgi is not.
Instance entanglement has been overlooked: In ID-PLL, instances from similar categories share overlapping features and candidate labels (e.g., Silver Fox and Arctic Fox), leading to severe class confusion. Statistics show that on CIFAR-10, 96.62% of instances in the most confused class pair share candidate labels.
Limitations of contrastive learning: Existing state-of-the-art methods (e.g., ABLE, DIRK) rely on contrastive learning to bring same-class representations closer together. For entangled instances, however, this can erroneously align samples from different classes, further blurring class boundaries.
Insufficient inter-class separation: Optimizing only intra-class alignment without explicitly enlarging inter-class distances allows entangled instances to continuously provide incorrect disambiguation signals during iterative training, ultimately degrading classification performance.
Entangled instances are prevalent and impactful: Experiments show that even at a cosine similarity threshold above 0.90, Fashion-MNIST still contains 529,000 entangled instance pairs. As similarity increases, the accuracy of existing methods on these samples drops sharply.
Method¶
Overall Architecture (CAD)¶
CAD (Class-specific Augmentation based Disentanglement) consists of two core modules: - Representation learning module (intra-class regulation): Generates class-specific augmented samples and aligns their representations within the same class. - Confidence adjustment module (inter-class regulation): Suppresses high-confidence predictions on easily confused non-candidate labels via a weighted penalty loss.
The overall loss function is \(\mathcal{L}(\boldsymbol{x}, \mathcal{S}) = \mathcal{L}_{discls}(\boldsymbol{x}) + \frac{\beta}{|\mathcal{S}|}\sum_{s \in \mathcal{S}}\mathcal{L}_c(\boldsymbol{x}'_s)\), where \(\beta\) balances the two modules.
Key Design 1: Class-specific Augmentation Generation¶
For each instance \(\boldsymbol{x}\) and each candidate label \(s \in \mathcal{S}\), an augmented sample \(\boldsymbol{x}'_s\) is generated to emphasize class-discriminative features. Two instantiations are provided:
- CAM-based feature reweighting (CAD-CAM): Class Activation Mapping localizes class-relevant feature regions, and augmented samples are produced via \(\boldsymbol{x}'_s = \boldsymbol{a}_s \odot \boldsymbol{x} + \epsilon \cdot (\boldsymbol{1} - \boldsymbol{a}_s) \odot \boldsymbol{x}\), amplifying class-specific features while suppressing irrelevant regions. This approach is lightweight and requires no external models.
- Diffusion model editing (CAD): InstructPix2Pix is employed to perform image editing guided by class-name instructions, synthesizing semantically richer class-specific augmented samples. Augmentations are generated offline, adding approximately 24% to training time.
Key Design 2: Class-specific Augmentation Alignment¶
Augmented samples generated under the same candidate label are treated as positive pairs and aligned via contrastive learning:
The key innovations are: (1) positive pairs are constructed from augmentations guided by the same label, circumventing the core challenge of positive sample identification in weakly supervised settings; (2) augmentations of the same instance under different class labels serve as semantically hard negative samples, forcing the model to refine decision boundaries; (3) a weighting mechanism \(w(\boldsymbol{x}', \boldsymbol{x}^+)\) based on prediction logit similarity down-weights the influence of noisy augmentations.
Key Design 3: Weighted Penalty Loss¶
Candidate labels with high confidence receive larger positive weights (accelerating disambiguation), while high-confidence non-candidate labels are subject to stronger penalties (suppressing confusion):
Weights \(\omega_j\) are normalized separately within the candidate and non-candidate sets (\(\sum_{j \in \mathcal{S}} \omega_j = 1\), \(\sum_{j \in \bar{\mathcal{S}}} \omega_j = 1\)), ensuring robustness to candidate set size. A cross-entropy variant is used in practice for numerical stability. This loss belongs to the Leveraged Weighted Loss family and comes with Bayes consistency guarantees.
Key Experimental Results¶
Main Results: Classification Accuracy¶
| Method | Fashion-MNIST | CIFAR-10 | CIFAR-100 | Flower | Oxford-IIIT Pet |
|---|---|---|---|---|---|
| DIRK | 91.48 | 90.87 | 68.77 | 44.03 | 64.95 |
| ABLE | 89.81 | 83.92 | 63.92 | 43.51 | 54.19 |
| CEL | 87.78 | 89.18 | 68.73 | 38.51 | 68.19 |
| CAD-CAM | 91.64 | 92.69 | 69.08 | 49.67 | 74.56 |
| CAD | 92.14 | 93.57 | 72.03 | 47.88 | 69.46 |
CAD achieves the best performance on all 5 benchmarks, outperforming DIRK by 2.70% on CIFAR-10 and 3.26% on CIFAR-100. CAD-CAM performs better on fine-grained datasets.
Accuracy on Entangled Instances¶
| Dataset | Top 0.1% | Top 0.01% | Top 0.001% |
|---|---|---|---|
| CIFAR-10 DIRK | 91.78 | 85.88 | 74.09 |
| CIFAR-10 CAD | 94.51 | 90.90 | 83.37 |
| CIFAR-100 DIRK | 70.42 | 66.61 | 62.59 |
| CIFAR-100 CAD | 72.43 | 68.80 | 67.78 |
On the most challenging top 0.001% entangled pairs, CAD surpasses DIRK by 9.28% on CIFAR-10.
Ablation Study¶
| Variant | Fashion-MNIST | CIFAR-10 |
|---|---|---|
| CAD (full) | 92.14 | 93.57 |
| w/o CA (remove confidence adjustment) | 91.19 | 93.32 |
| w/o RL (remove representation learning) | 91.48 | 91.21 |
| w/o Both | 85.30 | 87.81 |
Both modules contribute positively; the representation learning module yields a more substantial gain (+2.36% on CIFAR-10).
Key Findings¶
- Gains are not attributable to external models: CAD-CAM, which uses no external generative model, already outperforms all baselines, validating the effectiveness of the core framework design. Directly incorporating diffusion-edited samples into DIRK or ABLE actually reduces performance, indicating that the gains stem from the structured integration approach.
- Inter-class distance is significantly enlarged: Both t-SNE visualizations and quantitative metrics confirm that CAD achieves the largest inter-class distances (class center distance: 1.103 vs. 0.936 for DIRK).
- Confusion matrix improvements: Error rates for highly confused class pairs such as cat–dog and truck–automobile are substantially reduced.
- Fine-grained prompts are effective: Using detailed class description prompts on Oxford-IIIT Pet improves CAD accuracy from 69.46% to 76.23%, surpassing CAD-CAM.
Highlights & Insights¶
- Clear problem formulation: The paper is the first to systematically define and analyze the "instance entanglement" phenomenon in ID-PLL, providing quantitative statistics and visualizations.
- Elegant framework design: The dual intra-class and inter-class regulation framework is concise yet effective; the two augmentation instantiations (CAM / diffusion) demonstrate the generality of the approach.
- Clever positive sample construction: Class-specific augmentation naturally resolves the core challenge of positive pair identification in weakly supervised contrastive learning.
- Thorough and rigorous experiments: 14 baselines, 5 datasets, dedicated analysis of entangled instances, and multi-faceted evaluation via t-SNE, confusion matrices, and inter-class distance metrics.
Limitations & Future Work¶
- Fine-grained categories depend on prompt quality: Diffusion model editing on fine-grained datasets requires manually crafted detailed class descriptions, limiting automation.
- Applicability to specialized domains: In medical or industrial imaging, visual semantics are difficult to express textually, and general-purpose diffusion models lack sufficient domain prior knowledge.
- Offline augmentation cost: Diffusion-based augmentation must be generated and stored offline, introducing additional storage and preprocessing overhead for large-scale datasets.
- Entanglement definition is sensitive to pre-trained features: Identification of entangled pairs relies on feature similarity extracted by a pre-trained ResNet, making the definition sensitive to the choice of feature extractor.
Related Work & Insights¶
- ID-PLL methods: VALEN (Dirichlet posterior inference), ABLE (ambiguity-guided contrastive learning), and DIRK (negative label confidence regulation) all fail to explicitly address instance entanglement.
- Contrastive learning for PLL: Methods such as PiCO employ prototype-based contrastive learning for disambiguation, but entangled instances sharing candidate labels are incorrectly aligned.
- Maximum margin methods: Early SVM-based approaches can implicitly enlarge inter-class distances but do not scale well to high-dimensional data.
- Diffusion-based image editing: InstructPix2Pix is innovatively repurposed for class-specific augmentation generation, rather than conventional image editing tasks.
Rating¶
- Novelty: ⭐⭐⭐⭐ — The systematic definition and analysis of instance entanglement offers a new perspective on ID-PLL; the CAD framework is novel in design.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 14 baselines, 5 datasets, dedicated entanglement analysis, and multi-dimensional visualizations; very comprehensive.
- Writing Quality: ⭐⭐⭐⭐ — Problem exposition is clear and figures are rich, though the density of mathematical notation is high.
- Value: ⭐⭐⭐⭐ — Provides an effective solution to class confusion in weakly supervised learning; the CAD-CAM variant is particularly practical.