NeurIPS 2025 Segmentation Pseudo-label learning semantic segmentation error-correcting output codes unsupervised domain adaptation semi-supervised learning

Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective¶

Conference: NeurIPS 2025 arXiv: 2512.06870 Code: https://github.com/Woof6/ECOCSeg Area: Segmentation Keywords: Pseudo-label learning, semantic segmentation, error-correcting output codes, unsupervised domain adaptation, semi-supervised learning

TL;DR¶

This paper proposes ECOCSeg, which replaces one-hot encoding with Error-Correcting Output Codes (ECOC) to represent semantic categories. It decomposes an N-class classification problem into K binary sub-tasks, and couples bit-level pseudo-label denoising with customized optimization losses to substantially improve the robustness of pseudo-label learning in UDA and SSL semantic segmentation.

Background & Motivation¶

Background: Semantic segmentation in label-scarce settings — including unsupervised domain adaptation (UDA) and semi-supervised learning (SSL) — relies heavily on pseudo-label learning. Dominant approaches fall into two paradigms: self-training (generating pseudo-labels via an EMA teacher model) and consistency regularization (enforcing prediction consistency across differently perturbed views of the same sample), both of which are essentially pseudo-label learning frameworks.

Limitations of Prior Work: Pseudo-labels inevitably contain errors. Existing methods assign class labels via one-hot encoding and argmax hard assignment, which completely misleads training whenever the prediction is wrong. Threshold-based filtering discards low-confidence samples, biasing the model toward easy examples; weighting strategies require careful tuning and generalize poorly. Critically, existing methods focus almost exclusively on selection strategies for pseudo-labels, with virtually no attention paid to the influence of the encoding representation itself.

Key Challenge: Visually similar categories (e.g., sheep/cow/horse) share visual attributes and are frequently confused with one another. Under one-hot encoding, any such confusion results in a completely erroneous label (zero mutual information). Yet these categories do share attributes — if the encoding scheme could exploit such shared structure, even an incorrect classification could still provide partially correct supervisory signal.

Goal: To revisit the pseudo-label noise problem from the perspective of encoding representation, and to design a category encoding that tolerates partial bit errors so that incorrect pseudo-labels can still provide meaningful supervision.

Key Insight: The authors draw inspiration from Error-Correcting Output Codes (ECOC) in communications, representing each class as a multi-bit binary codeword rather than a one-hot vector. Similar categories share certain bits in their codewords; even when classification is incorrect, shared bits remain correct, realizing fault-tolerant supervision where errors are partially recovered.

Core Idea: Replace one-hot encoding with ECOC to represent semantic categories, enabling pseudo-labels to provide partially correct supervisory signal at the bit level even when the class assignment is wrong.

Method¶

Overall Architecture¶

ECOCSeg can be integrated as a plug-in into existing pseudo-label learning frameworks. Input images are passed through an encoder to extract pixel-level features, which are then fed not into an N-class classifier but into K binary classifiers — each predicting one bit of the codeword. The resulting K-dimensional probability vector is used to identify the nearest-neighbor codeword in the codebook via soft Hamming distance, yielding the predicted class. A bit-level denoising mechanism is introduced during pseudo-label generation, combining the complementary strengths of bit-wise and code-wise pseudo-label forms. Three customized loss functions are jointly optimized during training.

Key Designs¶

ECOC Dense Classification Paradigm:
- Function: Transforms N-class semantic segmentation into K binary classification problems, endowing the framework with inherent error-correction capability.
- Mechanism: A binary codebook matrix of size \(N \times K\) is constructed, where each class corresponds to a codeword of length K. The classifier is replaced by K independent sigmoid binary classifiers, each predicting one bit. During inference, the class is determined by computing the soft Hamming distance between the predicted vector and each codeword in the codebook: \(d_{SH}(\mathbf{c}_n, \mathbf{p}^i) = \frac{1}{K}\sum_{k=1}^K \|p(k|\mathbf{z}_i) - \mathbf{c}_{nk}\|_1\), and selecting the nearest neighbor. Two codebook design strategies are provided: max-min distance encoding (maximizing the minimum inter-class Hamming distance to guarantee error-correction capacity) and text-based encoding (generating codes from class-name semantic relationships to ensure semantic consistency).
- Design Motivation: Under one-hot encoding, the inter-class Hamming distance is always 2, yielding zero error-correction capacity. ECOC encoding enables inter-class distances far exceeding 2, theoretically tolerating up to \(\lfloor(d-1)/2\rfloor\) bit errors. The paper further proves via NTK theory that ECOC is performance-equivalent to one-hot encoding under full supervision (Theorem 4.1) and achieves a tighter misclassification upper bound under pseudo-label noise (Theorem 4.2).
Reliable Bit Mining (RBM):
- Function: Mines reliable supervisory signals from noisy pseudo-labels at the bit level.
- Mechanism: ECOCSeg naturally yields two pseudo-label forms — bit-wise (directly quantizing the sigmoid output of each bit) and code-wise (querying the nearest-neighbor codeword). Each has distinct trade-offs: bit-wise labels are softer but carry independent per-bit noise; code-wise labels are perfectly accurate when classification is correct but introduce holistic noise upon misclassification. The algorithm queries the C nearest-neighbor codewords and identifies bit positions shared across all candidates in the set — these bits are correct regardless of which candidate is the true class — marking them as "reliable bits." The final mixed pseudo-label uses code-wise values at reliable positions and bit-wise values elsewhere. The size C is determined adaptively via confidence threshold T.
- Design Motivation: No single pseudo-label form can simultaneously achieve high precision and high coverage. By mining shared bits across candidate classes, the approach maximizes usable supervisory signal while guaranteeing correctness.
Customized Optimization Objectives:
- Function: Introduces structured representation constraints on top of bit-level BCE to accelerate convergence and enhance discriminability.
- Mechanism: Three losses are combined — (1) Binary cross-entropy \(\mathcal{L}_{bce}\) independently optimizes each bit classifier; (2) Pixel-codeword distance \(\mathcal{L}_{pcd} = 1 - \cos(\hat{\mathbf{p}}^i, \hat{\mathbf{c}}^i)\) encourages intra-class compactness by pulling logits toward the corresponding codeword direction; (3) Pixel-codeword contrast \(\mathcal{L}_{pcc}\) applies contrastive learning to push predictions away from non-target codewords, computed only on discriminative bit positions (\(P_d\)) while ignoring shared bits.
- Design Motivation: BCE alone ignores structural relationships among bits and lacks intra-class compactness and inter-class separation constraints. PCD provides intra-class constraints, PCC provides inter-class constraints, and the three losses are mutually complementary.

Loss & Training¶

The total loss is \(\mathcal{L}_{total} = \mathcal{L}_{bce} + \lambda_1 \mathcal{L}_{pcd} + \lambda_2 \mathcal{L}_{pcc}\), applied to both labeled supervised loss and unlabeled pseudo-label loss. The overall training pipeline preserves the standard self-training/consistency regularization paradigm, replacing only the encoding form, pseudo-label generation, and loss functions.

Key Experimental Results¶

Main Results¶

Baseline	Architecture	Original mIoU	+ECOCSeg mIoU	Gain
DACS (GTA→CS)	CNN	52.1	54.5	+2.4
DAFormer (GTA→CS)	Trans.	68.3	70.5	+2.2
MIC (GTA→CS)	Trans.	75.9	76.9	+1.0
DACS (SYN→CS)	CNN	48.3	52.1	+3.8
DAFormer (SYN→CS)	Trans.	60.9	63.3	+2.4
MIC (SYN→CS)	Trans.	68.7	69.8	+1.1

Ablation Study¶

Component	GTA→CS mIoU	Note
Baseline (DAFormer)	68.3	one-hot + CE
+ ECOC encoding	69.0	encoding form only
+ bit-wise PL	69.5	bit-level pseudo-labels
+ code-wise PL	69.3	codeword-level pseudo-labels
+ mixed PL (RBM)	70.0	reliable bit mining
+ PCD + PCC losses	70.5	full ECOCSeg

Key Findings¶

ECOCSeg consistently improves across different baselines (DACS/DAFormer/MIC) and architectures (CNN/Transformer), indicating orthogonality to existing improvements.
Gains are larger on SYNTHIA→Cityscapes (+3.8), as greater domain gaps produce noisier pseudo-labels, amplifying the error-correction advantage of ECOC.
Mixed pseudo-labels (RBM) outperform either bit-wise or code-wise labels alone, validating the complementarity assumption.
ECOC encoding incurs no performance degradation under full supervision, confirming the theoretical prediction of Theorem 4.1.
The paper further demonstrates that ECOC improves model calibration, indirectly enhancing pseudo-label quality in subsequent training iterations.

Highlights & Insights¶

Encoding perspective on pseudo-label noise: This is an entirely new and orthogonal direction, fully compatible and stackable with existing filtering and weighting strategies. The conceptual shift — optimizing the representation of labels rather than their selection — is particularly elegant.
Exploiting inter-class shared attributes: Sheep and cows both have horns and hooves; even when misclassified, bits corresponding to shared attributes remain correct. This insight is simple yet profound, naturally transferring error-correcting ideas from communications to semantic segmentation.
Theoretical guarantees: By leveraging the NTK framework, the paper proves both the equivalence of ECOC under full supervision and its superiority under noisy pseudo-labels, grounding the approach in rigorous theory rather than heuristics alone.

Limitations & Future Work¶

The optimal codebook configuration (choice of K and encoding strategy) may vary across datasets, and no adaptive selection mechanism is provided.
K binary classifiers introduce more parameters than a single N-class classifier, which may pose efficiency concerns when the number of classes is very large.
Validation is currently limited to semantic segmentation; the ECOC encoding idea is in principle transferable to any pseudo-label learning scenario (e.g., object detection, instance segmentation), which merits future exploration.
The threshold T in reliable bit mining is a fixed hyperparameter; an adaptive thresholding strategy may yield further improvements.

vs. threshold-based filtering (e.g., FixMatch): Threshold filtering discards uncertain samples, resulting in missing hard examples. ECOCSeg retains partial supervision for all samples at the bit level, achieving broader coverage.
vs. weighting strategies (e.g., FlexMatch): Weighting strategies require carefully designed weight functions, whereas ECOCSeg achieves noise tolerance through the encoding form itself, requiring no additional weight design.
vs. negative learning: Negative learning avoids noise by telling the model "what something is not"; ECOCSeg exploits noise by telling the model "what something partially is." The two approaches may be complementary.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Examining pseudo-label noise from an encoding perspective is a genuinely novel direction; introducing ECOC from communications into segmentation is a first.
Experimental Thoroughness: ⭐⭐⭐⭐ Covers multiple UDA/SSL baselines and benchmarks with detailed ablations, though validation beyond segmentation tasks is absent.
Writing Quality: ⭐⭐⭐⭐⭐ The problem formalization is clear, and the three-component analytical framework (encoding / pseudo-label strategy / optimization objective) is elegantly structured.
Value: ⭐⭐⭐⭐⭐ Orthogonal to existing methods, plug-and-play, and theoretically grounded — high in both practical utility and conceptual inspiration.