Skip to content

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Conference: CVPR2026 arXiv: 2603.04825 Code: RyanZhaoIc/CAD Area: Others (Weakly Supervised Learning / Partial Label Learning) Keywords: Partial Label Learning, Instance Entanglement, Class-specific Augmentation, Contrastive Learning, Weakly Supervised Classification

TL;DR

To address the "instance entanglement" problem in instance-dependent partial label learning (ID-PLL)—where instances from visually similar classes share overlapping features and candidate label sets—this paper proposes the CAD framework, which mitigates class confusion through two complementary mechanisms: intra-class alignment via class-specific augmentation and inter-class separation via a weighted penalty loss.

Background & Motivation

Practical demand for partial label learning: Acquiring precise labels in real-world scenarios is costly. Partial Label Learning (PLL) allows each sample to be associated with a set of candidate labels that includes the ground-truth label, enabling low-cost annotation via crowdsourcing or web mining.

Instance-dependent assumption is more realistic: Conventional PLL assumes that candidate labels are generated independently of instance features (random or class-conditional noise). In practice, however, label ambiguity is often determined by instance characteristics—for example, a Silver Fox dog is more likely to be annotated as "fox," whereas a Corgi is not.

Instance entanglement has been overlooked: In ID-PLL, instances from similar categories share overlapping features and candidate labels (e.g., Silver Fox and Arctic Fox), leading to severe class confusion. Statistics show that on CIFAR-10, 96.62% of instances in the most confused class pair share candidate labels.

Limitations of contrastive learning: Existing state-of-the-art methods (e.g., ABLE, DIRK) rely on contrastive learning to bring same-class representations closer together. For entangled instances, however, this can erroneously align samples from different classes, further blurring class boundaries.

Insufficient inter-class separation: Optimizing only intra-class alignment without explicitly enlarging inter-class distances allows entangled instances to continuously provide incorrect disambiguation signals during iterative training, ultimately degrading classification performance.

Entangled instances are prevalent and impactful: Experiments show that even at a cosine similarity threshold above 0.90, Fashion-MNIST still contains 529,000 entangled instance pairs. As similarity increases, the accuracy of existing methods on these samples drops sharply.

Method

Overall Architecture (CAD)

CAD (Class-specific Augmentation based Disentanglement) consists of two core modules: - Representation learning module (intra-class regulation): Generates class-specific augmented samples and aligns their representations within the same class. - Confidence adjustment module (inter-class regulation): Suppresses high-confidence predictions on easily confused non-candidate labels via a weighted penalty loss.

The overall loss function is \(\mathcal{L}(\boldsymbol{x}, \mathcal{S}) = \mathcal{L}_{discls}(\boldsymbol{x}) + \frac{\beta}{|\mathcal{S}|}\sum_{s \in \mathcal{S}}\mathcal{L}_c(\boldsymbol{x}'_s)\), where \(\beta\) balances the two modules.

Key Design 1: Class-specific Augmentation Generation

For each instance \(\boldsymbol{x}\) and each candidate label \(s \in \mathcal{S}\), an augmented sample \(\boldsymbol{x}'_s\) is generated to emphasize class-discriminative features. Two instantiations are provided:

  • CAM-based feature reweighting (CAD-CAM): Class Activation Mapping localizes class-relevant feature regions, and augmented samples are produced via \(\boldsymbol{x}'_s = \boldsymbol{a}_s \odot \boldsymbol{x} + \epsilon \cdot (\boldsymbol{1} - \boldsymbol{a}_s) \odot \boldsymbol{x}\), amplifying class-specific features while suppressing irrelevant regions. This approach is lightweight and requires no external models.
  • Diffusion model editing (CAD): InstructPix2Pix is employed to perform image editing guided by class-name instructions, synthesizing semantically richer class-specific augmented samples. Augmentations are generated offline, adding approximately 24% to training time.

Key Design 2: Class-specific Augmentation Alignment

Augmented samples generated under the same candidate label are treated as positive pairs and aligned via contrastive learning:

\[\mathcal{L}_c(\boldsymbol{x}') = -\sum_{\boldsymbol{x}^+ \in \mathcal{A}_{y'}} w(\boldsymbol{x}', \boldsymbol{x}^+) \log s_\tau(\boldsymbol{q}_{\boldsymbol{x}'}, \boldsymbol{k}_{\boldsymbol{x}^+}, \mathcal{K})\]

The key innovations are: (1) positive pairs are constructed from augmentations guided by the same label, circumventing the core challenge of positive sample identification in weakly supervised settings; (2) augmentations of the same instance under different class labels serve as semantically hard negative samples, forcing the model to refine decision boundaries; (3) a weighting mechanism \(w(\boldsymbol{x}', \boldsymbol{x}^+)\) based on prediction logit similarity down-weights the influence of noisy augmentations.

Key Design 3: Weighted Penalty Loss

Candidate labels with high confidence receive larger positive weights (accelerating disambiguation), while high-confidence non-candidate labels are subject to stronger penalties (suppressing confusion):

\[\mathcal{L}_{discls}(\boldsymbol{x}) = \sum_{j \in \mathcal{Y}} \omega_j \ell(\boldsymbol{s}_j, \boldsymbol{x})\]

Weights \(\omega_j\) are normalized separately within the candidate and non-candidate sets (\(\sum_{j \in \mathcal{S}} \omega_j = 1\), \(\sum_{j \in \bar{\mathcal{S}}} \omega_j = 1\)), ensuring robustness to candidate set size. A cross-entropy variant is used in practice for numerical stability. This loss belongs to the Leveraged Weighted Loss family and comes with Bayes consistency guarantees.

Key Experimental Results

Main Results: Classification Accuracy

Method Fashion-MNIST CIFAR-10 CIFAR-100 Flower Oxford-IIIT Pet
DIRK 91.48 90.87 68.77 44.03 64.95
ABLE 89.81 83.92 63.92 43.51 54.19
CEL 87.78 89.18 68.73 38.51 68.19
CAD-CAM 91.64 92.69 69.08 49.67 74.56
CAD 92.14 93.57 72.03 47.88 69.46

CAD achieves the best performance on all 5 benchmarks, outperforming DIRK by 2.70% on CIFAR-10 and 3.26% on CIFAR-100. CAD-CAM performs better on fine-grained datasets.

Accuracy on Entangled Instances

Dataset Top 0.1% Top 0.01% Top 0.001%
CIFAR-10 DIRK 91.78 85.88 74.09
CIFAR-10 CAD 94.51 90.90 83.37
CIFAR-100 DIRK 70.42 66.61 62.59
CIFAR-100 CAD 72.43 68.80 67.78

On the most challenging top 0.001% entangled pairs, CAD surpasses DIRK by 9.28% on CIFAR-10.

Ablation Study

Variant Fashion-MNIST CIFAR-10
CAD (full) 92.14 93.57
w/o CA (remove confidence adjustment) 91.19 93.32
w/o RL (remove representation learning) 91.48 91.21
w/o Both 85.30 87.81

Both modules contribute positively; the representation learning module yields a more substantial gain (+2.36% on CIFAR-10).

Key Findings

  • Gains are not attributable to external models: CAD-CAM, which uses no external generative model, already outperforms all baselines, validating the effectiveness of the core framework design. Directly incorporating diffusion-edited samples into DIRK or ABLE actually reduces performance, indicating that the gains stem from the structured integration approach.
  • Inter-class distance is significantly enlarged: Both t-SNE visualizations and quantitative metrics confirm that CAD achieves the largest inter-class distances (class center distance: 1.103 vs. 0.936 for DIRK).
  • Confusion matrix improvements: Error rates for highly confused class pairs such as cat–dog and truck–automobile are substantially reduced.
  • Fine-grained prompts are effective: Using detailed class description prompts on Oxford-IIIT Pet improves CAD accuracy from 69.46% to 76.23%, surpassing CAD-CAM.

Highlights & Insights

  • Clear problem formulation: The paper is the first to systematically define and analyze the "instance entanglement" phenomenon in ID-PLL, providing quantitative statistics and visualizations.
  • Elegant framework design: The dual intra-class and inter-class regulation framework is concise yet effective; the two augmentation instantiations (CAM / diffusion) demonstrate the generality of the approach.
  • Clever positive sample construction: Class-specific augmentation naturally resolves the core challenge of positive pair identification in weakly supervised contrastive learning.
  • Thorough and rigorous experiments: 14 baselines, 5 datasets, dedicated analysis of entangled instances, and multi-faceted evaluation via t-SNE, confusion matrices, and inter-class distance metrics.

Limitations & Future Work

  • Fine-grained categories depend on prompt quality: Diffusion model editing on fine-grained datasets requires manually crafted detailed class descriptions, limiting automation.
  • Applicability to specialized domains: In medical or industrial imaging, visual semantics are difficult to express textually, and general-purpose diffusion models lack sufficient domain prior knowledge.
  • Offline augmentation cost: Diffusion-based augmentation must be generated and stored offline, introducing additional storage and preprocessing overhead for large-scale datasets.
  • Entanglement definition is sensitive to pre-trained features: Identification of entangled pairs relies on feature similarity extracted by a pre-trained ResNet, making the definition sensitive to the choice of feature extractor.
  • ID-PLL methods: VALEN (Dirichlet posterior inference), ABLE (ambiguity-guided contrastive learning), and DIRK (negative label confidence regulation) all fail to explicitly address instance entanglement.
  • Contrastive learning for PLL: Methods such as PiCO employ prototype-based contrastive learning for disambiguation, but entangled instances sharing candidate labels are incorrectly aligned.
  • Maximum margin methods: Early SVM-based approaches can implicitly enlarge inter-class distances but do not scale well to high-dimensional data.
  • Diffusion-based image editing: InstructPix2Pix is innovatively repurposed for class-specific augmentation generation, rather than conventional image editing tasks.

Rating

  • Novelty: ⭐⭐⭐⭐ — The systematic definition and analysis of instance entanglement offers a new perspective on ID-PLL; the CAD framework is novel in design.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 14 baselines, 5 datasets, dedicated entanglement analysis, and multi-dimensional visualizations; very comprehensive.
  • Writing Quality: ⭐⭐⭐⭐ — Problem exposition is clear and figures are rich, though the density of mathematical notation is high.
  • Value: ⭐⭐⭐⭐ — Provides an effective solution to class confusion in weakly supervised learning; the CAD-CAM variant is particularly practical.