Federated Modality-specific Encoders and Partially Personalized Fusion Decoder for Multimodal Brain Tumor Segmentation¶
Conference: CVPR2025
arXiv: 2603.04887
Code: GitHub
Area: Medical Image
Keywords: federated learning, brain tumor segmentation, multimodal MRI, personalized FL, missing modality
TL;DR¶
This work proposes FedMEPD, a federated learning framework that simultaneously addresses inter-modality heterogeneity and client personalization in multimodal MRI brain tumor segmentation through modality-specific encoders (globally federated) and a partially personalized fusion decoder. It achieves an average client mDSC of 75.70%/75.90% on BraTS 2018/2020.
Background & Motivation¶
Federated learning (FL) enables collaborative training across multiple institutions without compromising privacy. Existing medical image FL methods primarily address intra-modality heterogeneity (e.g., differences in data distribution) but overlook inter-modality heterogeneity: in multimodal MRI brain tumor segmentation, different institutions may only possess a subset of the four modalities (T1, T1c, T2, FLAIR). This introduces two concurrent challenges: (1) how to effectively train a global model in the presence of incomplete modalities, and (2) how to provide each participant with a personalized model tailored to their local data characteristics. Existing approaches either require all clients to have identical modalities (Xiong et al.), train a single global model that fails to meet personalization needs (FedIoT), or require data sharing, which violates privacy constraints (CreamFL).
Method¶
Overall Architecture¶
FedMEPD consists of four core components:
- Modality-specific Encoders: Each modality \(m \in \{T1, T1c, T2, FLAIR\}\) has an independent encoder \(E_m\), which is globally shared and federated.
- Multimodal Fusion Decoder: Fuses all modality features on the server side, while being partially federated and partially personalized on the client side.
- Multi-Anchor Multimodal Representation: Extracts class anchors from the server's fused features and distributes them to clients.
- LACCA Module: Clients calibrate missing-modality features toward global anchors using cross-attention.
Modality-specific Encoders (Fully Federated)¶
Each modality uses an independent encoder for feature extraction, allowing high parameter specialization. The server aggregates the encoder parameters of the same modality: \(W_m^s = \frac{1}{N_m} \sum_i W_m^i\). The server-side fusion decoder bridges the distribution gap among different modalities via backpropagation.
Partially Personalized Fusion Decoder¶
Core idea: Dynamically determine which filters in the decoder are federated (shared) and which are personalized (retained) based on the consistency between global and local parameter updates.
- Client aggregation: \(W_d^{i,agg} = (1 - B^{i,r-1}) W_d^{i,r-1} + B^{i,r-1} W_d^{s,r-1}\)
- Consistency judgment: Calculate the cosine similarity of global/local updates for each filter: \(\delta_j^{i,r} = \cos(\Delta \mathbf{w}_j^{s,r}, \Delta \mathbf{w}_j^{i,r})\)
- Personalization rule: If a filter has \(\delta_j^{i,r} < 0\) for \(P\) consecutive rounds, it is permanently personalized.
- Server aggregation: Uses an EMA strategy to balance the contributions of the server and clients, with \(\lambda\) dynamically set to 1 (fully personalized) or 0.3 (others).
Multi-Anchor Multimodal Representation + LACCA¶
- The server extracts \(N_k = 4\) anchors per class from the fused features, clustering them via K-means and smoothing updates with EMA (\(\omega = 0.999\)).
- Clients calibrate local missing-modality features toward global anchors using scaled dot-product cross-attention: $\(F_l^{cal} = \text{softmax}\left[\frac{F_l W_0 (A_l W_1)^T}{\sqrt{C_l}}\right] A_l W_2\)$
Loss & Training¶
Dice loss + cross-entropy loss (standard medical image segmentation loss), Adam optimizer, lr=0.0002.
Key Experimental Results¶
BraTS 2018 (285 cases, divided into 9 sites, 8 clients + 1 server):
| Method | Average Client mDSC (%) | Server mDSC (%) |
|---|---|---|
| Local models | 66.95 | 82.56 |
| FedAvg | 59.04 | 80.10 |
| FedMSplit | 71.23 | 79.93 |
| FedIoT | 69.18 | 84.89 |
| CreamFL* | 67.21 | 82.83 |
| FedMEPD | 75.70 | 84.98 |
BraTS 2020 (369 cases):
| Method | Average Client mDSC (%) | Server mDSC (%) |
|---|---|---|
| FedMSplit | 73.80 | 86.88 |
| FedIoT | 71.20 | 88.77 |
| FedMEPD | 75.90 | 89.39 |
- The average client mDSC outperforms the second-best method by 4.47% (BraTS 2018) and 2.10% (BraTS 2020).
- Single-modality clients (e.g., T1c only) show the most significant improvement: 58.87% vs. 48.99% of FedMSplit.
- *CreamFL requires data sharing, which violates privacy constraints.
Highlights & Insights¶
- Simultaneously optimizes both the global full-modality model and personalized missing-modality client models, balancing dual objectives.
- The partially personalized strategy is dynamically determined based on the consistency of parameter updates, offering clear theoretical intuition.
- Multi-anchor representation + cross-attention calibration: Only abstract population-level prototypes are transmitted, preserving privacy while compensating for missing modality information.
- Compared to a fully personalized decoder (prior work), the partial federated strategy significantly improves client performance.
- The framework is model-agnostic and can be adapted to various multimodal segmentation backbones.
Limitations & Future Work¶
- Assumes the existence of a server with full-modality data, which may be difficult to satisfy in practice.
- Once a filter is marked as personalized, it is irreversible, which may prematurely lock certain parameters.
- Communication cost analysis is insufficient (although tiny mask transmission overhead is mentioned).
- Only validated on the brain tumor segmentation task; other multimodal medical tasks (e.g., cardiac, liver) remain unexplored.
- The number of clients is relatively small (8), and scalability under large-scale scenarios has not been validated.
Related Work & Insights¶
- FedAvg (McMahan et al., 2017): Classic FL baseline, does not handle modality heterogeneity.
- FedMSplit (Chen & Zhang, 2022): Multimodal FL but lacks a personalization mechanism.
- FedNorm (Bernecker et al., 2022): Adjusts only normalization parameters to handle modality differences, which is insufficient for high heterogeneity.
- RFNet (Ding et al., 2021): Centralized multimodal segmentation method, which serves as the backbone network of the proposed framework.
- CreamFL (Yu et al., 2023): Requires sharing multimodal data, which violates privacy constraints.
Rating¶
- Novelty: ⭐⭐⭐⭐ (Novel combined design of modality-specific encoders, partially personalized decoder, and multi-anchor calibration)
- Experimental Thoroughness: ⭐⭐⭐⭐ (Two benchmarks, comparisons with multiple FL baselines, thorough ablation studies, and statistical significance tests)
- Writing Quality: ⭐⭐⭐⭐ (Complete structure and clear description of algorithms)
- Value: ⭐⭐⭐⭐⭐ (Addresses real pain points in multimodal FL with significant clinical relevance)