ProxyFL: A Proxy-Guided Framework for Federated Semi-Supervised Learning¶

Conference: CVPR 2026 arXiv: 2602.21078 Code: DuowenC/FSSLlib Area: AI Security Keywords: Federated Learning, Semi-Supervised Learning, Data Heterogeneity, Proxy Learning, Pseudo-Labels

TL;DR¶

ProxyFL is proposed as a framework that leverages classifier weights as unified proxies to simultaneously mitigate external heterogeneity (cross-client distribution discrepancy) and internal heterogeneity (distribution mismatch between labeled and unlabeled data) in federated semi-supervised learning, achieving substantial improvements over existing FSSL methods across multiple benchmarks.

Background & Motivation¶

Federated semi-supervised learning (FSSL) enables multiple clients to collaboratively train a global model using limited labeled data and abundant unlabeled data while preserving privacy. The core challenge lies in two dimensions of data heterogeneity:

External Heterogeneity: Distribution discrepancies across clients (non-IID). Existing methods mitigate this via dynamic aggregation weights, but naive averaging can be skewed by outlier clients, causing deviation from the global class distribution.

Internal Heterogeneity: Distribution mismatch between labeled and unlabeled data within a single client. Existing methods typically discard low-confidence samples to avoid pseudo-label errors, resulting in insufficient training data participation.

Through empirical analysis, the authors identify two key observations: (1) simple averaging of classifier weights is susceptible to outlier shifts and fails to effectively capture the global class distribution; (2) as heterogeneity increases, more unlabeled samples are excluded from training, despite their potential to improve performance.

Method¶

Overall Architecture¶

ProxyFL defines the learnable weights \(\boldsymbol{\Omega}_m = \{\omega_m^c\}_{c=1}^C\) of the final fully connected layer as class proxies, which model both local and global class distributions. The framework comprises two core modules: Global Proxy Tuning (GPT) on the server side and Indecisive-Categories Proxy Learning (ICPL) on the client side. The proxies are part of the model parameters, introducing no additional privacy risks or communication overhead.

Key Designs¶

Global Proxy Tuning (GPT): Explicitly optimizes the global proxy on the server to fit cross-client class distributions. The global proxy is first initialized via simple averaging \(\overline{\boldsymbol{\Omega}}_{\mathcal{G}}\), then further fine-tuned via a contrastive objective that pulls the global proxy \(\boldsymbol{\Omega}_{\mathcal{G}}^c\) toward same-class local proxies and pushes it away from different-class ones:

\[\mathcal{L}_{\text{GPT}} = \sum_{c=1}^{C}\sum_{m=1}^{M} -\log \frac{e^{-\phi(\boldsymbol{\Omega}_{\mathcal{G}}^c, \omega_m^c)}}{e^{-\phi(\boldsymbol{\Omega}_{\mathcal{G}}^c, \omega_m^c)} + \sum_{c' \neq c} e^{-\phi(\boldsymbol{\Omega}_{\mathcal{G}}^c, \omega_m^{c'})}}\]

The computational complexity is only \(O(Q \times M \times C^2 \times d)\), approximately 0.4 GFLOPs on CIFAR-100—equivalent to a single image inference pass and thus negligible.

Indecisive-Categories Proxy Learning (ICPL): Rather than discarding low-confidence unlabeled samples or assigning a single pseudo-label, this module constructs an "indecisive category set" \(\xi_i\). For each low-confidence sample \(\mathbf{u}_i^{\text{lc}}\), any class whose global logit \(\overline{\mathbf{y}}_i(c)\) exceeds the global class prior \(\mathcal{P}_{\mathcal{G}}'(\mathbf{Y}(c))\) is included in \(\xi_i\):

\[\xi_i = \{c \mid \overline{\mathbf{y}}_i(c) > \mathcal{P}_{\mathcal{G}}'(\mathbf{Y}(c))\}\]

The prior \(\mathcal{P}_{\mathcal{G}}'\) is aggregated from the prediction preferences of each client's model, imposing higher thresholds for majority classes and lower thresholds for minority classes to dynamically regulate the indecisive category range.

Positive-Negative Proxy Pool: Positive and negative proxy pools are constructed based on each sample's category set \(\xi_i\). For high-confidence samples, the positive proxy is the classifier weight corresponding to the pseudo-label class \(\omega_i^{\text{hc}} = \omega_k^{\hat{y}_i}\); for low-confidence samples, it is a weighted sum of the indecisive category proxies \(\omega_i^{\text{lc}} = \sum_{c' \in \xi_i} \tilde{\mathbf{y}}_i(c') \times \omega_k^{c'}\). Negative samples are drawn from samples in the batch whose category sets have no overlap with the current sample. A contrastive objective ensures that all samples—including low-confidence ones—participate in training.

Loss & Training¶

The total loss comprises local and global components:

\[\mathcal{L} = \underbrace{\mathcal{L}_s + \alpha \mathcal{L}_u + \beta \mathcal{L}_{\text{ICPL}}}_{\text{local}} + \underbrace{\mathcal{L}_{\text{GPT}}}_{\text{global}}\]

\(\mathcal{L}_s\): Cross-entropy loss on labeled data
\(\mathcal{L}_u\): KL divergence loss on high-confidence unlabeled data (strongly augmented predictions vs. pseudo-labels)
\(\mathcal{L}_{\text{ICPL}}\): Contrastive loss over all unlabeled data
\(\alpha, \beta\) are both set to 1

Key Experimental Results¶

Main Results¶

10% label rate; heterogeneity controlled via Dirichlet distribution (smaller \(\alpha\) indicates higher heterogeneity):

Dataset	\(\alpha\)	Metric (Acc)	ProxyFL	Prev. SOTA (SAGE)	Gain
CIFAR-10	0.1	Acc	88.56	87.05	+1.51
CIFAR-100	0.1	Acc	57.50	54.18	+3.32
SVHN	0.1	Acc	95.09	93.85	+1.24
CINIC-10	0.1	Acc	77.98	74.59	+3.39
CIFAR-100	0.5	Acc	58.75	55.82	+2.93
CINIC-10	0.5	Acc	78.96	75.74	+3.22

On SVHN and CINIC-10 (\(\alpha=0.1\)), ProxyFL approaches the fully supervised upper bound of FedAvg-SL.

Ablation Study¶

Configuration	CIFAR-10 (\(\alpha\)=0.1)	CIFAR-100 (\(\alpha\)=0.1)	Notes
Baseline (GPL)	84.56	48.96	FedAvg+FixMatch-GPL
+GPT	87.59	54.86	Global proxy tuning yields significant gains
+ICPL	87.81	57.21	Low-confidence sample participation is effective
+GPT+ICPL	88.56	57.50	Two modules are complementary, achieving optimal performance

Comparison of indecisive category set designs (\(\alpha=0.1\)):

Strategy	CIFAR-100	SVHN	Notes
Top-1	55.66	94.56	Single pseudo-label
Top-5	56.58	94.71	Fixed top-5 categories
\(\mathcal{P}_{\mathcal{G}}'(\mathbf{Y})\)	57.21	94.82	Dynamic prior threshold achieves best results

Key Findings¶

Convergence speed: ProxyFL reaches 50% accuracy on CIFAR-100 (\(\alpha=0.1\)) in only 177 rounds, a 3.18× speedup over LPL's 562 rounds.
Recall rate of the indecisive category set substantially exceeds the precision of single pseudo-labels, validating the set-based strategy.
Proxy vs. prototype: The proxy approach outperforms FedProto+FSSL variants across all datasets without introducing privacy risks (prototypes are susceptible to feature inversion attacks).

Highlights & Insights¶

The novel use of classifier weights as "proxies" to unify the handling of both external and internal heterogeneity avoids the privacy leakage risks inherent in prototype-based methods.
The indecisive category set is an elegant design—rather than hard-coding a single pseudo-label, it preserves uncertainty and allows contrastive learning to handle ambiguity naturally.
The server-side tuning overhead of the GPT module is minimal (roughly equivalent to a single image inference), making it highly practical.

Limitations & Future Work¶

Validation is limited to image classification; extension to more complex tasks such as detection and segmentation remains unexplored.
Experiments cover only the Labels-at-All-Clients scenario; other FSSL settings such as Labels-at-Partial-Clients are not addressed.
The prior distribution \(\mathcal{P}_{\mathcal{G}}'\) for the indecisive category set accumulates across global communication rounds and may be unstable in early rounds.
The number of clients is fixed at 20; scalability to larger-scale federated settings is not investigated.

FedDure / SAGE: Current FSSL state of the art; ProxyFL builds upon these by introducing a proxy mechanism.
FedProto: Represents class distributions via prototypes but carries privacy leakage risks, as features can be reconstructed via inversion.
FixMatch: A standard SSL baseline whose high-confidence filtering strategy leads to data waste in FSSL settings.
Proxy learning has established applications in metric learning; this work is the first to introduce it into FSSL for addressing heterogeneity.

Rating¶

Novelty: ⭐⭐⭐⭐ (Unified proxy framework addressing both internal and external heterogeneity with conceptual clarity)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (4 datasets × 3 heterogeneity levels with comprehensive ablations)
Writing Quality: ⭐⭐⭐⭐ (Clear problem–observation–solution derivation logic)
Value: ⭐⭐⭐⭐ (Substantive contribution to the FSSL field)