Skip to content

Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective

Conference: NeurIPS 2025 arXiv: 2512.06870 Code: https://github.com/Woof6/ECOCSeg Area: Segmentation Keywords: Pseudo-label learning, semantic segmentation, error-correcting output codes, unsupervised domain adaptation, semi-supervised learning

TL;DR

This paper proposes ECOCSeg, which replaces one-hot encoding with Error-Correcting Output Codes (ECOC) to represent semantic categories. It decomposes an N-class classification problem into K binary sub-tasks, and couples bit-level pseudo-label denoising with customized optimization losses to substantially improve the robustness of pseudo-label learning in UDA and SSL semantic segmentation.

Background & Motivation

Background: Semantic segmentation in label-scarce settings — including unsupervised domain adaptation (UDA) and semi-supervised learning (SSL) — relies heavily on pseudo-label learning. Dominant approaches fall into two paradigms: self-training (generating pseudo-labels via an EMA teacher model) and consistency regularization (enforcing prediction consistency across differently perturbed views of the same sample), both of which are essentially pseudo-label learning frameworks.

Limitations of Prior Work: Pseudo-labels inevitably contain errors. Existing methods assign class labels via one-hot encoding and argmax hard assignment, which completely misleads training whenever the prediction is wrong. Threshold-based filtering discards low-confidence samples, biasing the model toward easy examples; weighting strategies require careful tuning and generalize poorly. Critically, existing methods focus almost exclusively on selection strategies for pseudo-labels, with virtually no attention paid to the influence of the encoding representation itself.

Key Challenge: Visually similar categories (e.g., sheep/cow/horse) share visual attributes and are frequently confused with one another. Under one-hot encoding, any such confusion results in a completely erroneous label (zero mutual information). Yet these categories do share attributes — if the encoding scheme could exploit such shared structure, even an incorrect classification could still provide partially correct supervisory signal.

Goal: To revisit the pseudo-label noise problem from the perspective of encoding representation, and to design a category encoding that tolerates partial bit errors so that incorrect pseudo-labels can still provide meaningful supervision.

Key Insight: The authors draw inspiration from Error-Correcting Output Codes (ECOC) in communications, representing each class as a multi-bit binary codeword rather than a one-hot vector. Similar categories share certain bits in their codewords; even when classification is incorrect, shared bits remain correct, realizing fault-tolerant supervision where errors are partially recovered.

Core Idea: Replace one-hot encoding with ECOC to represent semantic categories, enabling pseudo-labels to provide partially correct supervisory signal at the bit level even when the class assignment is wrong.

Method

Overall Architecture

ECOCSeg can be integrated as a plug-in into existing pseudo-label learning frameworks. Input images are passed through an encoder to extract pixel-level features, which are then fed not into an N-class classifier but into K binary classifiers — each predicting one bit of the codeword. The resulting K-dimensional probability vector is used to identify the nearest-neighbor codeword in the codebook via soft Hamming distance, yielding the predicted class. A bit-level denoising mechanism is introduced during pseudo-label generation, combining the complementary strengths of bit-wise and code-wise pseudo-label forms. Three customized loss functions are jointly optimized during training.

Key Designs

  1. ECOC Dense Classification Paradigm:

    • Function: Transforms N-class semantic segmentation into K binary classification problems, endowing the framework with inherent error-correction capability.
    • Mechanism: A binary codebook matrix of size \(N \times K\) is constructed, where each class corresponds to a codeword of length K. The classifier is replaced by K independent sigmoid binary classifiers, each predicting one bit. During inference, the class is determined by computing the soft Hamming distance between the predicted vector and each codeword in the codebook: \(d_{SH}(\mathbf{c}_n, \mathbf{p}^i) = \frac{1}{K}\sum_{k=1}^K \|p(k|\mathbf{z}_i) - \mathbf{c}_{nk}\|_1\), and selecting the nearest neighbor. Two codebook design strategies are provided: max-min distance encoding (maximizing the minimum inter-class Hamming distance to guarantee error-correction capacity) and text-based encoding (generating codes from class-name semantic relationships to ensure semantic consistency).
    • Design Motivation: Under one-hot encoding, the inter-class Hamming distance is always 2, yielding zero error-correction capacity. ECOC encoding enables inter-class distances far exceeding 2, theoretically tolerating up to \(\lfloor(d-1)/2\rfloor\) bit errors. The paper further proves via NTK theory that ECOC is performance-equivalent to one-hot encoding under full supervision (Theorem 4.1) and achieves a tighter misclassification upper bound under pseudo-label noise (Theorem 4.2).
  2. Reliable Bit Mining (RBM):

    • Function: Mines reliable supervisory signals from noisy pseudo-labels at the bit level.
    • Mechanism: ECOCSeg naturally yields two pseudo-label forms — bit-wise (directly quantizing the sigmoid output of each bit) and code-wise (querying the nearest-neighbor codeword). Each has distinct trade-offs: bit-wise labels are softer but carry independent per-bit noise; code-wise labels are perfectly accurate when classification is correct but introduce holistic noise upon misclassification. The algorithm queries the C nearest-neighbor codewords and identifies bit positions shared across all candidates in the set — these bits are correct regardless of which candidate is the true class — marking them as "reliable bits." The final mixed pseudo-label uses code-wise values at reliable positions and bit-wise values elsewhere. The size C is determined adaptively via confidence threshold T.
    • Design Motivation: No single pseudo-label form can simultaneously achieve high precision and high coverage. By mining shared bits across candidate classes, the approach maximizes usable supervisory signal while guaranteeing correctness.
  3. Customized Optimization Objectives:

    • Function: Introduces structured representation constraints on top of bit-level BCE to accelerate convergence and enhance discriminability.
    • Mechanism: Three losses are combined — (1) Binary cross-entropy \(\mathcal{L}_{bce}\) independently optimizes each bit classifier; (2) Pixel-codeword distance \(\mathcal{L}_{pcd} = 1 - \cos(\hat{\mathbf{p}}^i, \hat{\mathbf{c}}^i)\) encourages intra-class compactness by pulling logits toward the corresponding codeword direction; (3) Pixel-codeword contrast \(\mathcal{L}_{pcc}\) applies contrastive learning to push predictions away from non-target codewords, computed only on discriminative bit positions (\(P_d\)) while ignoring shared bits.
    • Design Motivation: BCE alone ignores structural relationships among bits and lacks intra-class compactness and inter-class separation constraints. PCD provides intra-class constraints, PCC provides inter-class constraints, and the three losses are mutually complementary.

Loss & Training

The total loss is \(\mathcal{L}_{total} = \mathcal{L}_{bce} + \lambda_1 \mathcal{L}_{pcd} + \lambda_2 \mathcal{L}_{pcc}\), applied to both labeled supervised loss and unlabeled pseudo-label loss. The overall training pipeline preserves the standard self-training/consistency regularization paradigm, replacing only the encoding form, pseudo-label generation, and loss functions.

Key Experimental Results

Main Results

Baseline Architecture Original mIoU +ECOCSeg mIoU Gain
DACS (GTA→CS) CNN 52.1 54.5 +2.4
DAFormer (GTA→CS) Trans. 68.3 70.5 +2.2
MIC (GTA→CS) Trans. 75.9 76.9 +1.0
DACS (SYN→CS) CNN 48.3 52.1 +3.8
DAFormer (SYN→CS) Trans. 60.9 63.3 +2.4
MIC (SYN→CS) Trans. 68.7 69.8 +1.1

Ablation Study

Component GTA→CS mIoU Note
Baseline (DAFormer) 68.3 one-hot + CE
+ ECOC encoding 69.0 encoding form only
+ bit-wise PL 69.5 bit-level pseudo-labels
+ code-wise PL 69.3 codeword-level pseudo-labels
+ mixed PL (RBM) 70.0 reliable bit mining
+ PCD + PCC losses 70.5 full ECOCSeg

Key Findings

  • ECOCSeg consistently improves across different baselines (DACS/DAFormer/MIC) and architectures (CNN/Transformer), indicating orthogonality to existing improvements.
  • Gains are larger on SYNTHIA→Cityscapes (+3.8), as greater domain gaps produce noisier pseudo-labels, amplifying the error-correction advantage of ECOC.
  • Mixed pseudo-labels (RBM) outperform either bit-wise or code-wise labels alone, validating the complementarity assumption.
  • ECOC encoding incurs no performance degradation under full supervision, confirming the theoretical prediction of Theorem 4.1.
  • The paper further demonstrates that ECOC improves model calibration, indirectly enhancing pseudo-label quality in subsequent training iterations.

Highlights & Insights

  • Encoding perspective on pseudo-label noise: This is an entirely new and orthogonal direction, fully compatible and stackable with existing filtering and weighting strategies. The conceptual shift — optimizing the representation of labels rather than their selection — is particularly elegant.
  • Exploiting inter-class shared attributes: Sheep and cows both have horns and hooves; even when misclassified, bits corresponding to shared attributes remain correct. This insight is simple yet profound, naturally transferring error-correcting ideas from communications to semantic segmentation.
  • Theoretical guarantees: By leveraging the NTK framework, the paper proves both the equivalence of ECOC under full supervision and its superiority under noisy pseudo-labels, grounding the approach in rigorous theory rather than heuristics alone.

Limitations & Future Work

  • The optimal codebook configuration (choice of K and encoding strategy) may vary across datasets, and no adaptive selection mechanism is provided.
  • K binary classifiers introduce more parameters than a single N-class classifier, which may pose efficiency concerns when the number of classes is very large.
  • Validation is currently limited to semantic segmentation; the ECOC encoding idea is in principle transferable to any pseudo-label learning scenario (e.g., object detection, instance segmentation), which merits future exploration.
  • The threshold T in reliable bit mining is a fixed hyperparameter; an adaptive thresholding strategy may yield further improvements.
  • vs. threshold-based filtering (e.g., FixMatch): Threshold filtering discards uncertain samples, resulting in missing hard examples. ECOCSeg retains partial supervision for all samples at the bit level, achieving broader coverage.
  • vs. weighting strategies (e.g., FlexMatch): Weighting strategies require carefully designed weight functions, whereas ECOCSeg achieves noise tolerance through the encoding form itself, requiring no additional weight design.
  • vs. negative learning: Negative learning avoids noise by telling the model "what something is not"; ECOCSeg exploits noise by telling the model "what something partially is." The two approaches may be complementary.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ Examining pseudo-label noise from an encoding perspective is a genuinely novel direction; introducing ECOC from communications into segmentation is a first.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Covers multiple UDA/SSL baselines and benchmarks with detailed ablations, though validation beyond segmentation tasks is absent.
  • Writing Quality: ⭐⭐⭐⭐⭐ The problem formalization is clear, and the three-component analytical framework (encoding / pseudo-label strategy / optimization objective) is elegantly structured.
  • Value: ⭐⭐⭐⭐⭐ Orthogonal to existing methods, plug-and-play, and theoretically grounded — high in both practical utility and conceptual inspiration.