FedP²EFT: Federated Learning to Personalize PEFT for Multilingual LLMs¶

Conference: AAAI2026
arXiv: 2502.04387
Authors: Royson Lee, Minyoung Kim, Fady Rezk, Rui Li, Stylianos I. Venieris, Timothy Hospedales (Samsung AI / Univ. of Edinburgh)
Code: GitHub
Area: Optimization
Keywords: Federated Learning, Personalized PEFT, LoRA Rank Selection, Multilingual LLM, Bayesian Sparse Selection

TL;DR¶

This paper proposes FedP²EFT, which collaboratively trains a Personalization Strategy Generator (PSG) via federated learning to automatically generate personalized LoRA rank structures for each client, substantially outperforming manually designed PEFT configurations and existing FL personalization methods in multilingual LLM fine-tuning.

Background & Motivation¶

The Federated Learning Dilemma for Multilingual LLMs¶

Federated learning enables multilingual LLMs to leverage low-resource language data distributed across different regions while complying with privacy regulations such as GDPR. However, existing methods face three major challenges:

Curse of Multilinguality: As the number of languages increases, the performance of a single global model degrades.
Negative Interference: Different languages compete for limited model capacity.
Lack of Personalization Strategy: Existing methods employ manually designed, uniform PEFT configurations that overlook the heterogeneous personalization needs of different clients.

Why Is Personalizing LoRA Rank More Important Than Learning Rate?¶

Existing FL hyperparameter optimization methods (e.g., FedL2P) primarily learn personalized learning rates; however, LLMs typically use adaptive optimizers such as Adam and are thus robust to learning rate variations. In contrast, the structure of PEFT adapters—which layers to apply LoRA to and what rank to use—has a far more critical impact on cross-lingual transfer learning.

Core Problem¶

How to automatically learn the optimal personalized LoRA rank configuration for each client in a federated learning setting, while avoiding overfitting caused by limited per-client data?

Method¶

BayesTune-LoRA (BT-LoRA)¶

Inspired by BayesTune, rank-wise latent variables \(\lambda \in \mathbb{R}^r, \lambda_i > 0\) are introduced for each LoRA matrix, modifying LoRA to \(B\lambda A\). The optimization objective is:

\[\theta^* = \arg\min_\theta \mathcal{L}_{\text{CE}}(\theta; D) + \frac{\alpha_s}{N}\mathcal{L}_s(\boldsymbol{\lambda}, \boldsymbol{B}) + \frac{\alpha_p}{N}\mathcal{L}_p(\boldsymbol{\lambda})\]

where \(\mathcal{L}_s\) is the log of a Laplace prior (encouraging larger \(\lambda\) for important ranks):

\[\mathcal{L}_s(\boldsymbol{\lambda}, \boldsymbol{B}) = \sum_l^L \sum_i^r \frac{\|B_{l,i}\|_1}{\lambda_{l,i}}\]

and \(\mathcal{L}_p\) is the log of a Gamma hyperprior (encouraging overall small \(\lambda\) for sparsity):

\[\mathcal{L}_p(\boldsymbol{\lambda}) = \sum_l^L \sum_i^r (\log \lambda_{l,i} + 100 \cdot \lambda_{l,i})\]

Intuitively, \(\mathcal{L}_p\) drives \(\lambda\) toward zero (sparsification), while \(\mathcal{L}_s\) maintains large \(\lambda\) for ranks with substantial updates. Their interplay retains important ranks and prunes unimportant ones.

PSG: Personalization Strategy Generator¶

A single-hidden-layer MLP serves as the PSG. It takes client metadata (mean and standard deviation of each base model layer's features) as input and outputs personalized \(\hat{\boldsymbol{\lambda}}\):

\[\hat{\boldsymbol{\lambda}} = \text{MLP}(\phi;\; E(h_0), SD(h_0), E(h_1), SD(h_1), \ldots, E(h_{L-1}), SD(h_{L-1}))\]

Federated Training Procedure¶

In each federated round, each sampled client \(i\) performs:

Receives PSG parameters \(\phi\) from the server and computes \(\hat{\boldsymbol{\lambda}}^i\) via forward pass.
Stage 1: Inserts \(\hat{\boldsymbol{\lambda}}^i\) into BT-LoRA and fine-tunes for \(s\) steps according to the above objective, yielding optimized \(\hat{\boldsymbol{\lambda}}^{i,s}\).
Stage 2: Uses \(\hat{\boldsymbol{\lambda}}^{i,s}\) as regression targets and trains the MLP with an L1 loss.
Sends updated \(\phi\) back to the server for FedAvg aggregation.

Inference¶

At deployment, new clients (including those unseen during training) use the PSG to generate \(\boldsymbol{\lambda}\), select the top-\((r \cdot L)\) largest ranks according to the resource budget \(r \cdot L\), freeze \(\boldsymbol{\lambda}\), and perform standard fine-tuning. A single PSG training run supports all rank budgets \(\leq r_{\text{max target}}\).

Key Experimental Results¶

MasakhaNEWS Text Classification (16 African languages, Seen clients, \(r=2\))¶

Language	LoRA	AdaLoRA	BT-LoRA	FedL2P	FedP²EFT
eng	90.4	89.9	89.9	90.7	92.0
amh	45.7	45.2	45.2	45.7	52.0
tir	44.9	44.9	44.9	45.3	63.5
orm	64.2	64.0	64.0	64.4	72.2
fra	88.6	88.6	88.6	89.1	93.5

FedP²EFT's advantage is especially pronounced on low-resource languages (tir, amh, orm), e.g., Tigrinya improves from 44.9% to 63.5% (+18.6 pp).

Unseen Client Generalization¶

Language	LoRA	FedL2P	FedP²EFT
xho	64.2	64.4	78.5
tir	41.9	41.9	58.3
orm	62.0	62.2	73.0
run	82.0	82.6	88.4

FedP²EFT also achieves substantial improvements on clients entirely absent from training, validating the generalization capability of the PSG.

XNLI + FedDPA-T Compatibility with Personalized FL¶

Language	LoRA	FedL2P	FedP²EFT
ur	41.9	44.8	63.7
bg	45.8	47.5	64.4
hi	42.8	44.5	57.8

FedP²EFT integrates seamlessly with existing personalized FL methods (FedDPA-T, DEPT, etc.), further enhancing personalization performance.

Highlights & Insights¶

First federated LoRA rank personalization: Frames PEFT structure selection as a federated learning problem, avoiding overfitting from independent per-client training.
Single training run covers all rank budgets: The sparse selection property of BT-LoRA makes PSG training a one-time cost.
Broad compatibility: Can be plugged into standard FL, FedDPA-T, DEPT, and other FL frameworks.
Large gains on low-resource languages: Improvements of up to 18.6 pp on extremely low-resource languages such as Tigrinya and Amharic.
Clear theoretical grounding: The sparse prior over LoRA ranks is derived from Bayesian sparse model selection.

Limitations & Future Work¶

PSG relies solely on summary statistics: Means and standard deviations may discard distributional details; richer metadata extraction could yield further gains.
Only LoRA is evaluated: Applicability to other PEFT methods (Adapter, Prefix Tuning, IA³) remains unexplored.
FedAvg aggregation only: No comparison with more advanced aggregation strategies such as FedProx or SCAFFOLD.
Sensitivity to Stage 1 step count \(s\): Too large risks overfitting to client data; too small yields low-quality \(\hat{\boldsymbol{\lambda}}^{i,s}\); no adaptive scheduling is investigated.
Limited experimental scale: Instruction tuning is validated only on MobileLLaMA-1.4B and Llama-3.2-3B.

FedL2P: Learns personalized learning rates via second-order optimization, which is of limited utility under LLM + Adam settings; FedP²EFT targets rank structure directly and avoids second-order computation.
AdaLoRA: SVD-based rank allocation is prone to overfitting under the data-scarce FL regime; FedP²EFT mitigates data insufficiency through federated collaboration.
BT-LoRA (standalone): Independent per-client optimization of \(\boldsymbol{\lambda}\) leads to overfitting; FedP²EFT trains the PSG federally before generating \(\boldsymbol{\lambda}\), yielding better generalization.
DEPT / FedDPA-T: Manually designate personalized layers (embeddings/LoRA) without automatic adaptation to client needs; FedP²EFT can serve as a complementary module on top of these methods.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of Bayesian sparse rank selection and federated meta-learning is a novel contribution.
Experimental Thoroughness: ⭐⭐⭐⭐ — Covers text classification and instruction tuning, seen/unseen clients, and multiple FL backbones, though model scale is relatively small.
Writing Quality: ⭐⭐⭐⭐ — Motivation is clearly articulated, method derivation is complete, and figures are intuitive.
Value: ⭐⭐⭐⭐ — Addresses a practical pain point in federated LLM personalization, with significant value for low-resource language scenarios.