FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning¶

Conference: CVPR 2026
arXiv: 2511.22265
Code: GitHub
Area: AI Safety / Federated Learning
Keywords: Federated Learning, Model Heterogeneity, Entangled Representation, Privacy Preservation, Communication Efficiency

TL;DR¶

This paper proposes FedRE, a framework that achieves a three-way balance among performance, privacy protection, and communication overhead in model-heterogeneous federated learning via "entangled representations"—aggregating all local representations of each client into a single cross-class representation using normalized random weights.

Background & Motivation¶

Federated learning (FL) enables multiple clients to collaboratively train models while preserving privacy. In practice, however, hardware and computational capabilities vary significantly across clients, making it unrealistic to enforce a homogeneous model architecture for all participants. This has motivated research on model-heterogeneous FL, where clients may employ different representation extractors while sharing a homogeneous classifier.

Existing model-heterogeneous FL methods face a dilemma in selecting the form of client knowledge to share: - Representations / logits / small models: Effectively encode high-level knowledge but introduce significant communication overhead and privacy risks when uploaded to the server (raw samples can be reconstructed via representation inversion attacks). - Classifiers: Lightweight but may inherit biases from local data distributions. - Prototypes (class-mean representations): Lightweight and reduce privacy risks, but capture only class-level information with limited intra-class variability, and tend to produce overly sharp decision boundaries when training a global classifier.

The core problem: Does a more effective, privacy-safe, and lightweight form of client knowledge exist for model-heterogeneous FL?

Method¶

Overall Architecture¶

The core idea of FedRE is to aggregate all local representations of each client into a single cross-class "entangled representation" along with a corresponding "entangled label encoding," which are uploaded to the server to train the global classifier. The workflow proceeds as follows:

Client update: Train the local model on local data using cross-entropy loss.
Representation entanglement and upload: Generate the entangled representation and entangled label encoding and upload them to the server.
Global classifier update and broadcast: The server trains the global classifier using the entangled representations and broadcasts it to clients.

Key Designs¶

Representation Mapping (RM):
- Maps local representations from heterogeneous architectures to a consistent dimensionality.
- Three RM operations are evaluated: average pooling (AP), max pooling (MP), and fully connected layer (FC).
- AP performs best (CIFAR-100 PRA: 46.36% vs. MP 45.97% vs. FC 44.53%).
Representation Entanglement (RE):

\(\widetilde{\mathbf{r}}_k = \sum_{i=1}^{|\mathcal{D}_k|} w_i^k \text{RM}[\mathbf{g}_k(\phi_k; \mathbf{x}_i^k)], \quad \widetilde{\mathbf{y}}_k = \sum_{i=1}^{|\mathcal{D}_k|} w_i^k \mathbf{y}_i^k\)

where \(w_i^k \in [0,1]\) are normalized random weights. The same set of weights simultaneously aggregates both the representations and their one-hot label encodings.

The default strategy is Random Average Prototype (RAP): class prototypes are first computed, then aggregated into a single entangled representation using random weights.

Per-round random weight resampling:
- Random weights are resampled at each communication round, introducing diversity.
- The entangled label encoding provides cross-class supervision signals.
- This prevents the global classifier from being overconfident about any single class, promoting smoother decision boundaries.
- Design Motivation: Motivated by a toy experiment comparing FedAllRep (uploading all representations, best performance 63.50%), FedGH (uploading prototypes, sharp boundaries, 60.50%), and FedRE (smooth boundaries, 62.00%).
Privacy protection mechanism:
- The entangled representation mixes cross-class information, making individual sample reconstruction difficult.
- Each client uploads only one entangled representation per round, further reducing the attack surface for information leakage.

Loss & Training¶

Client: Local model trained with cross-entropy loss \(\mathcal{L}_{ce}\).
Server: Global classifier trained with cross-entropy loss: \(\min_\omega \sum_{k=1}^K \mathcal{L}_{ce}[f(\omega; \widetilde{\mathbf{r}}_k), \widetilde{\mathbf{y}}_k]\).
The computational complexity of RE is only \(\mathcal{O}(n(d+C))\), requiring no additional gradient computation.
Setup: 10 clients, SGD optimizer, 100 communication rounds, NVIDIA A800 GPU.

Key Experimental Results¶

Main Results¶

Method	CIFAR-10 (PRA)	CIFAR-100 (PRA)	TinyImageNet (PRA)	CIFAR-10 (PAT)	CIFAR-100 (PAT)	TinyImageNet (PAT)	Avg.
FedProto	78.36	35.00	18.16	83.81	56.72	29.61	50.28
FedGH	78.66	40.91	25.04	85.43	58.07	31.98	53.35
FedTGP	81.32	35.89	28.70	84.68	54.67	35.64	53.48
Local	81.20	41.57	25.81	84.68	57.96	33.02	54.04
FedRE	82.60	46.36	30.48	86.20	62.56	38.52	57.79

FedRE outperforms all baselines across all settings, surpassing FedGH by 6.54% and FedKD by 6.79% under the TinyImageNet PAT setting.

Ablation Study¶

Communication overhead (CIFAR-100, number of scalars ×10³):

Metric	LG-FedAvg	FedGH	FedKD	FedGen	FedProto	FedMRL	FedRE
Upload	513.00	257.02	4234.28	9247.08	257.02	8863.08	5.12
Broadcast	513.00	512.00	4234.28	513.00	512.00	8863.08	513.00

FedRE's upload cost is only 5.12K scalars—less than 2% of FedProto and over 1,700× lower than FedMRL.

Privacy protection (representation inversion attack, TinyImageNet):

Knowledge Form	PSNR ↓	MSE ↑
Representations (FedAllRep)	12.89	4514.91
Prototypes (FedGH)	10.25	6992.04
Entangled representations (FedRE)	9.66	7781.87

The entangled representation achieves the lowest PSNR and highest MSE, yielding reconstructed images from which no meaningful information can be identified.

RE mechanism comparison (CIFAR-100 PRA):

Mechanism	RSR	VAR	RAR	RSP	VAP	RAP
Accuracy	40.41	44.88	43.19	43.25	46.12	46.36

RAP (Random Average Prototype) achieves the best performance, as prototypes are more representative than raw representations, and random weighting is more effective than uniform aggregation.

Key Findings¶

Entangled representations achieve performance close to uploading all representations: FedRE (30.48%) vs. FedAllRep (31.20%), with approximately 10× lower communication overhead.
Per-round resampling is critical: Fixed weights vs. per-round resampling yield 45.84% vs. 46.36% on CIFAR-100, with a larger gap on synthetic datasets (41.50% vs. 62.00%).
The additional training cost introduced by RE is negligible: only 0.09 seconds per round on CIFAR-10 (5.69s → 5.78s).
The choice of weight distribution (uniform / Laplacian / Gaussian) has minimal impact on performance, demonstrating the flexibility of the framework.
FedRE maintains state-of-the-art performance in large-scale settings with 100 clients (participation rates of 10/100 and 20/100).

Highlights & Insights¶

The entangled representation is an elegant design that simultaneously addresses three objectives—performance, privacy, and communication—rather than trading off among them as existing methods do.
The per-round resampling of random weights resembles data augmentation through randomness, preventing overfitting to specific weight configurations by introducing training diversity.
The entangled label encoding provides "cross-class soft supervision," which bears a conceptual resemblance to label smoothing—yet the randomness here is more fundamental, as label encodings differ entirely across rounds.
A key distinction from Mixup: FedRE aggregates all representations within each client into a single vector (rather than performing pairwise interpolation), and serves an entirely different purpose.

Limitations & Future Work¶

A rigorous non-convex convergence analysis is absent (acknowledged by the authors as future work).
When client data is extremely imbalanced (e.g., a client holds only 1–2 classes), the entangled representation may carry insufficient information.
The method has not been evaluated on larger-scale models (e.g., LLMs or ViT-L).
The global classifier architecture must be shared across all clients, limiting flexibility in fully heterogeneous settings.
The distribution and sampling strategy for random weights may admit better alternatives; current experiments suggest uniform distribution is marginally superior but with negligible margins.

Most directly related to FedGH (training a global classifier based on prototypes)—FedRE can be viewed as its natural evolution, extending from single-class prototypes to cross-class entangled representations.
Orthogonal to FedAvg-family methods (parameter aggregation)—since FedRE cannot directly aggregate parameters due to model heterogeneity, it instead adopts a knowledge distillation perspective.
The framework offers an instructive perspective on the privacy–efficiency–performance triangle in federated learning: well-designed client knowledge representations can simultaneously advance all three objectives.
The concept of entangled representations may extend to other settings, such as federated continual learning and federated domain adaptation.

Rating¶

Novelty: ⭐⭐⭐⭐ (The entangled representation concept is novel; despite surface similarities to Mixup, the motivation and implementation differ substantially.)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (Systematic analysis of 10 research questions; three-dimensional evaluation of communication, privacy, and performance; 10 heterogeneous architectures.)
Writing Quality: ⭐⭐⭐⭐ (Q&A-style experimental structure is clear; toy experiments are intuitive.)
Value: ⭐⭐⭐⭐ (Provides a practical and elegant solution for model-heterogeneous FL.)