Unsupervised Learning for Class Distribution Mismatch (UCDM)¶

Conference: ICML2025
arXiv: 2505.06948
Code: To be confirmed
Area: Distribution Shift
Keywords: Class Distribution Mismatch, Unsupervised Learning, Diffusion Models, Pseudo-labeling, Open-Set Recognition

TL;DR¶

UCDM is proposed to train classifiers by synthesizing positive and negative sample pairs from unlabeled data using diffusion models. It addresses the class distribution mismatch (CDM) between training sets and target tasks without relying on labeled data, significantly outperforming existing semi-supervised methods on both closed-set and open-set tasks.

Background & Motivation¶

Class Distribution Mismatch (CDM) refers to the practical problem where the class distribution of training data is inconsistent with the requirements of the target task. Existing methods mainly rely on semi-supervised learning (SSL), requiring labeled data to define "known classes" and treating categories not appearing in the labeled data as "unknown classes". These methods suffer from two core limitations:

Strong reliance on labeled data: Manual annotation is required, making these methods inapplicable in fully unlabeled scenarios, and annotation costs are high.

Performance bottlenecks caused by limited annotations: Semi-supervised CDM (SCDM) methods based on one-vs-all classifiers (e.g., OpenMatch) perform acceptably on known classes but poorly when grouping unknown/new classes into a unified "other" class.

Core Problem proposed by the authors: Can a classifier capable of handling both closed-set and open-set tasks be trained under a completely unlabeled setting, given only the names of known classes?

Method¶

Overall Architecture¶

The UCDM framework consists of three core components: (1) positive instance generation pipeline; (2) negative instance generation pipeline; and (3) a confidence-based pseudo-labeling mechanism.

1. Positive Instance Generation¶

Utilizing a text-to-image diffusion model to generate positive samples belonging to known classes, which must satisfy three properties:

No domain shift: Seed samples are randomly drawn from the training set, and noise is added via forward diffusion (instead of starting from a random noise vector) to preserve seed sample information.
Diversity: Setting \(\sigma_t = 1\) during the reverse process to introduce random noise at each step.
Clear category: Leveraging the class prompt \(\mathcal{C}_y\) = "A photo of a [CLASS]" to guide conditional generation.

Forward diffusion process:

\[x_t = \sqrt{\alpha_t} \, x_0 + \sqrt{1 - \alpha_t} \, \epsilon, \quad \epsilon \sim \mathcal{N}(0, 1)\]

2. Negative Instance Generation¶

Core Idea: Erasing specified semantic classes from images via Conditional DDIM Inversion.

Theorem 3.1 (Conditional DDIM Inversion) indicates that the conditional inversion process progressively moves the noise vector away from the semantic direction of class \(y\):

\[x_t = \sqrt{\alpha_t} \, x_0 - \sum_{i=0}^{t-1} \left[ \nabla_{x_i} \log p_\theta(x_i)^{s_i} + \nabla_{x_i} \log p_\theta(y | x_i)^{s_i} \right] + \text{残差项}\]

where \(-\nabla_{x_i} \log p_\theta(y | x_i)\) represents the gradient direction that reduces the probability of the sample belonging to class \(y\).

Practical formulation of conditional inversion:

\[x_t = \sqrt{\frac{\alpha_t}{\alpha_{t-1}}} \, x_{t-1} + \sqrt{\alpha_t} \, \psi(\alpha_t, \alpha_{t-1}, 0) \, \epsilon_\theta(x_{t-1}, t, \mathcal{C}_y)\]

Theorem 3.2 (Unconditional DDIM Reverse) shows that the image \(\tilde{x}_0\) generated by unconditional DPM reverse preserves the visual features of the original image \(x_0\), while only approximately removing the class semantics:

\[\tilde{x}_0 \approx x_0 - \frac{1}{\sqrt{\alpha_t}} \sum_{i=0}^{t-1} \nabla_{x_i} \log p_\theta(y | x_i)^{s_i}\]

3. Confidence-Based Pseudo-Labeling Mechanism¶

Combining two confidence perspectives to assign pseudo-labels to real images:

Other-probability-driven: Computing the probability of belonging to the "other" class via the outputs of \(K\) binary classifiers:

\[p(y \in \mathcal{Y}_{\text{other}} | x) = \prod_{j=1}^{K} [1 - p(j | x)]\]

Known-probability-driven: Combining predictions from the closed-set classifier \(\hat{p}(j|x)\) and the open-set classifier \(p(j|x)\):

\[\tilde{q}_j = \hat{p}(j|x) \times p(j|x), \quad j = 1, \dots, K\]

A pseudo-label is assigned when the most confident classes from both perspectives are consistent and the score exceeds a threshold \(\delta\).

4. Total Training Loss¶

\[\mathcal{L} = \mathcal{L}_{\text{gen}}^{(\mathcal{D}_P, \mathcal{D}_N)} + \mathcal{L}_{\text{gen}}^{(\mathcal{D}_{\text{known}}, \mathcal{D}_N')} + \mathcal{L}_{\text{gen}}^{(\mathcal{D}_P', \mathcal{D}_{\text{unknown}})}\]

The three terms correspond respectively to: training on generated data, training on known-class real data, and training on unknown-class real data.

Key Experimental Results¶

Closed-set Tasks (Known Class Classification Accuracy)¶

Method	CIFAR-10 (60%)	CIFAR-100 (60%)	Tiny-ImageNet (60%)
DS³L	66.6	23.4	26.3
UASD	79.3	22.8	5.3
CCSSL	95.7	45.6	25.8
T2T	-	50.6	41.7
OpenMatch	68.5	10.3	10.9
IOMatch	89.8	31.1	32.8
UCDM (Ours)	95.6	50.9	32.3

Open-set Tasks (60% mismatch, Tiny-ImageNet)¶

Method	Known Acc	Unknown Acc	New Acc	Balance Score
OpenMatch	10.8	3.5	5.9	3.0
IOMatch	0.0	100.0	100.0	8.9
UCDM (Ours)	15.8	94.9	95.4	22.9

Core Metrics¶

Under Tiny-ImageNet with 60% mismatch, UCDM (unlabeled) outperforms OpenMatch (40 labels per class): known classes +5.0%, unknown classes +91.4%, new classes +89.5%
On CIFAR-10 open-set tasks, UCDM achieves a balance score of > 91 across all mismatch ratios, far exceeding all baselines

Highlights & Insights¶

First exploration of unsupervised settings: A purely unsupervised paradigm is proposed for the CDM problem for the first time. It requires only known class names for training, breaking the dependency of SSL methods on labeled data.
Innovative application of diffusion models: Semantic erasure is achieved using conditional DDIM inversion. Theoretical proofs (Theorems 3.1 & 3.2) demonstrate that this operation indeed moves latent variables in the direction of minimizing class likelihood.
Dual-perspective confidence labeling mechanism: It elegantly fuses other-probability-driven and known-probability-driven confidences, which is more robust than a single-perspective approach.
Massive advantages on open-set tasks: Existing SSL methods fail almost completely on unknown/new classes (mostly achieving 0% accuracy), whereas UCDM accurately identifies unknown classes while preserving performance on known classes.

Limitations & Future Work¶

Prior assumption of known class names: Although annotations are not required, a pre-defined list of known class names is still necessary, which is inapplicable to scenarios where class names are unknown.
Reliance on pre-trained diffusion models: The effectiveness of the method depends on high-quality text-to-image diffusion models, which may exhibit limited performance on fine-grained classes unseen by the diffusion model.
Low known-class accuracy on Tiny-ImageNet: The known-class accuracy in open-set tasks is only 15-22%, indicating that the quality of generated positive samples is limited in fine-grained classification scenarios.
Computational overhead: Each training sample requires forward/reverse processes of the diffusion model to generate positive-negative sample pairs, resulting in significantly higher training and generation costs compared to standard SSL methods.
Evaluation limited to small-scale datasets: Experiments were not conducted on large-scale datasets such as the full ImageNet.

SSL under CDM: UASD, CCSSL, T2T (closed-set); OpenMatch, IOMatch (open-set) — all require labeled data.
Diffusion-based generative augmentation: DPT, DWD — require retraining of the diffusion model and assume distribution matching.
Score-based generative models: DDIM inversion theory provides the theoretical foundation for semantic erasure.

Rating¶

Novelty: ⭐⭐⭐⭐ — First pure unsupervised paradigm proposed in CDM scenarios, with a theoretically supported approach for semantic erasure via conditional inversion.
Experimental Thoroughness: ⭐⭐⭐ — Three standard datasets, various mismatch ratios, and complete ablation studies, though lacking validation on large-scale datasets.
Writing Quality: ⭐⭐⭐⭐ — Clear theoretical derivations, rigorous theorem proofs, and intuitive illustrations.
Value: ⭐⭐⭐⭐ — Opens up a new direction for unsupervised learning in CDM, offering inspiring insights for open-set recognition and distribution shift fields.