Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning¶

Conference: CVPR2026 arXiv: 2603.04825 Code: RyanZhaoIc/CAD Area: Others (Weakly Supervised Learning / Partial Label Learning) Keywords: Partial Label Learning, Instance Entanglement, Class-specific Augmentation, Contrastive Learning, Weakly Supervised Classification

TL;DR¶

To address the "instance entanglement" problem in instance-dependent partial label learning (ID-PLL)—where instances from visually similar classes share overlapping features and candidate label sets—this paper proposes the CAD framework, which mitigates class confusion through two complementary mechanisms: intra-class alignment via class-specific augmentation and inter-class separation via a weighted penalty loss.

Background & Motivation¶

Practical demand for partial label learning: Acquiring precise labels in real-world scenarios is costly. Partial Label Learning (PLL) allows each sample to be associated with a set of candidate labels that includes the ground-truth label, enabling low-cost annotation via crowdsourcing or web mining.

Instance-dependent assumption is more realistic: Conventional PLL assumes that candidate labels are generated independently of instance features (random or class-conditional noise). In practice, however, label ambiguity is often determined by instance characteristics—for example, a Silver Fox dog is more likely to be annotated as "fox," whereas a Corgi is not.

Instance entanglement has been overlooked: In ID-PLL, instances from similar categories share overlapping features and candidate labels (e.g., Silver Fox and Arctic Fox), leading to severe class confusion. Statistics show that on CIFAR-10, 96.62% of instances in the most confused class pair share candidate labels.

Limitations of contrastive learning: Existing state-of-the-art methods (e.g., ABLE, DIRK) rely on contrastive learning to bring same-class representations closer together. For entangled instances, however, this can erroneously align samples from different classes, further blurring class boundaries.

Insufficient inter-class separation: Optimizing only intra-class alignment without explicitly enlarging inter-class distances allows entangled instances to continuously provide incorrect disambiguation signals during iterative training, ultimately degrading classification performance.

Entangled instances are prevalent and impactful: Experiments show that even at a cosine similarity threshold above 0.90, Fashion-MNIST still contains 529,000 entangled instance pairs. As similarity increases, the accuracy of existing methods on these samples drops sharply.

Method¶

Overall Architecture (CAD)¶

CAD (Class-specific Augmentation based Disentanglement) consists of two core modules: - Representation learning module (intra-class regulation): Generates class-specific augmented samples and aligns their representations within the same class. - Confidence adjustment module (inter-class regulation): Suppresses high-confidence predictions on easily confused non-candidate labels via a weighted penalty loss.

The overall loss function is \(\mathcal{L}(\boldsymbol{x}, \mathcal{S}) = \mathcal{L}_{discls}(\boldsymbol{x}) + \frac{\beta}{|\mathcal{S}|}\sum_{s \in \mathcal{S}}\mathcal{L}_c(\boldsymbol{x}'_s)\), where \(\beta\) balances the two modules.

Key Design 1: Class-specific Augmentation Generation¶

For each instance \(\boldsymbol{x}\) and each candidate label \(s \in \mathcal{S}\), an augmented sample \(\boldsymbol{x}'_s\) is generated to emphasize class-discriminative features. Two instantiations are provided:

CAM-based feature reweighting (CAD-CAM): Class Activation Mapping localizes class-relevant feature regions, and augmented samples are produced via \(\boldsymbol{x}'_s = \boldsymbol{a}_s \odot \boldsymbol{x} + \epsilon \cdot (\boldsymbol{1} - \boldsymbol{a}_s) \odot \boldsymbol{x}\), amplifying class-specific features while suppressing irrelevant regions. This approach is lightweight and requires no external models.
Diffusion model editing (CAD): InstructPix2Pix is employed to perform image editing guided by class-name instructions, synthesizing semantically richer class-specific augmented samples. Augmentations are generated offline, adding approximately 24% to training time.

Key Design 2: Class-specific Augmentation Alignment¶

Augmented samples generated under the same candidate label are treated as positive pairs and aligned via contrastive learning:

\[\mathcal{L}_c(\boldsymbol{x}') = -\sum_{\boldsymbol{x}^+ \in \mathcal{A}_{y'}} w(\boldsymbol{x}', \boldsymbol{x}^+) \log s_\tau(\boldsymbol{q}_{\boldsymbol{x}'}, \boldsymbol{k}_{\boldsymbol{x}^+}, \mathcal{K})\]

The key innovations are: (1) positive pairs are constructed from augmentations guided by the same label, circumventing the core challenge of positive sample identification in weakly supervised settings; (2) augmentations of the same instance under different class labels serve as semantically hard negative samples, forcing the model to refine decision boundaries; (3) a weighting mechanism \(w(\boldsymbol{x}', \boldsymbol{x}^+)\) based on prediction logit similarity down-weights the influence of noisy augmentations.

Key Design 3: Weighted Penalty Loss¶

Candidate labels with high confidence receive larger positive weights (accelerating disambiguation), while high-confidence non-candidate labels are subject to stronger penalties (suppressing confusion):

\[\mathcal{L}_{discls}(\boldsymbol{x}) = \sum_{j \in \mathcal{Y}} \omega_j \ell(\boldsymbol{s}_j, \boldsymbol{x})\]

Weights \(\omega_j\) are normalized separately within the candidate and non-candidate sets (\(\sum_{j \in \mathcal{S}} \omega_j = 1\), \(\sum_{j \in \bar{\mathcal{S}}} \omega_j = 1\)), ensuring robustness to candidate set size. A cross-entropy variant is used in practice for numerical stability. This loss belongs to the Leveraged Weighted Loss family and comes with Bayes consistency guarantees.

Key Experimental Results¶

Main Results: Classification Accuracy¶

Method	Fashion-MNIST	CIFAR-10	CIFAR-100	Flower	Oxford-IIIT Pet
DIRK	91.48	90.87	68.77	44.03	64.95
ABLE	89.81	83.92	63.92	43.51	54.19
CEL	87.78	89.18	68.73	38.51	68.19
CAD-CAM	91.64	92.69	69.08	49.67	74.56
CAD	92.14	93.57	72.03	47.88	69.46

CAD achieves the best performance on all 5 benchmarks, outperforming DIRK by 2.70% on CIFAR-10 and 3.26% on CIFAR-100. CAD-CAM performs better on fine-grained datasets.

Accuracy on Entangled Instances¶

Dataset	Top 0.1%	Top 0.01%	Top 0.001%
CIFAR-10 DIRK	91.78	85.88	74.09
CIFAR-10 CAD	94.51	90.90	83.37
CIFAR-100 DIRK	70.42	66.61	62.59
CIFAR-100 CAD	72.43	68.80	67.78

On the most challenging top 0.001% entangled pairs, CAD surpasses DIRK by 9.28% on CIFAR-10.

Ablation Study¶

Variant	Fashion-MNIST	CIFAR-10
CAD (full)	92.14	93.57
w/o CA (remove confidence adjustment)	91.19	93.32
w/o RL (remove representation learning)	91.48	91.21
w/o Both	85.30	87.81

Both modules contribute positively; the representation learning module yields a more substantial gain (+2.36% on CIFAR-10).

Key Findings¶

Gains are not attributable to external models: CAD-CAM, which uses no external generative model, already outperforms all baselines, validating the effectiveness of the core framework design. Directly incorporating diffusion-edited samples into DIRK or ABLE actually reduces performance, indicating that the gains stem from the structured integration approach.
Inter-class distance is significantly enlarged: Both t-SNE visualizations and quantitative metrics confirm that CAD achieves the largest inter-class distances (class center distance: 1.103 vs. 0.936 for DIRK).
Confusion matrix improvements: Error rates for highly confused class pairs such as cat–dog and truck–automobile are substantially reduced.
Fine-grained prompts are effective: Using detailed class description prompts on Oxford-IIIT Pet improves CAD accuracy from 69.46% to 76.23%, surpassing CAD-CAM.

Highlights & Insights¶

Clear problem formulation: The paper is the first to systematically define and analyze the "instance entanglement" phenomenon in ID-PLL, providing quantitative statistics and visualizations.
Elegant framework design: The dual intra-class and inter-class regulation framework is concise yet effective; the two augmentation instantiations (CAM / diffusion) demonstrate the generality of the approach.
Clever positive sample construction: Class-specific augmentation naturally resolves the core challenge of positive pair identification in weakly supervised contrastive learning.
Thorough and rigorous experiments: 14 baselines, 5 datasets, dedicated analysis of entangled instances, and multi-faceted evaluation via t-SNE, confusion matrices, and inter-class distance metrics.

Limitations & Future Work¶

Fine-grained categories depend on prompt quality: Diffusion model editing on fine-grained datasets requires manually crafted detailed class descriptions, limiting automation.
Applicability to specialized domains: In medical or industrial imaging, visual semantics are difficult to express textually, and general-purpose diffusion models lack sufficient domain prior knowledge.
Offline augmentation cost: Diffusion-based augmentation must be generated and stored offline, introducing additional storage and preprocessing overhead for large-scale datasets.
Entanglement definition is sensitive to pre-trained features: Identification of entangled pairs relies on feature similarity extracted by a pre-trained ResNet, making the definition sensitive to the choice of feature extractor.

ID-PLL methods: VALEN (Dirichlet posterior inference), ABLE (ambiguity-guided contrastive learning), and DIRK (negative label confidence regulation) all fail to explicitly address instance entanglement.
Contrastive learning for PLL: Methods such as PiCO employ prototype-based contrastive learning for disambiguation, but entangled instances sharing candidate labels are incorrectly aligned.
Maximum margin methods: Early SVM-based approaches can implicitly enlarge inter-class distances but do not scale well to high-dimensional data.
Diffusion-based image editing: InstructPix2Pix is innovatively repurposed for class-specific augmentation generation, rather than conventional image editing tasks.

Rating¶

Novelty: ⭐⭐⭐⭐ — The systematic definition and analysis of instance entanglement offers a new perspective on ID-PLL; the CAD framework is novel in design.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 14 baselines, 5 datasets, dedicated entanglement analysis, and multi-dimensional visualizations; very comprehensive.
Writing Quality: ⭐⭐⭐⭐ — Problem exposition is clear and figures are rich, though the density of mathematical notation is high.
Value: ⭐⭐⭐⭐ — Provides an effective solution to class confusion in weakly supervised learning; the CAD-CAM variant is particularly practical.