Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging¶

Conference: ACL 2025
arXiv: 2505.22934
Code: None
Area: Model Compression
Keywords: LoRA merging, model merging, orthogonal subspace, parameter-data interaction, task interference

TL;DR¶

OSRM identifies that the failure of LoRA model merging stems from interaction interference between parameters and data distributions (rather than merely parameter conflicts). It proposes initializing the LoRA A matrix prior to fine-tuning via eigenvalue decomposition of the data covariance matrix, making its subspace orthogonal to the data distributions of other tasks. This minimizes cross-task interference during merging, significantly improving merging performance across 8 datasets and 5 models.

Background & Motivation¶

Background: Model merging (such as Task Arithmetic, TIES, etc.) can combine multiple task models into one without retraining, but performs poorly on LoRA fine-tuned models—resulting in severe performance degradation.

Limitations of Prior Work: Existing methods (such as KnOTS and parameter orthogonalization) only focus on downstream alignment or decoupling of parameters while ignoring how parameters interact with input data. After merging, data from Task 1 passing through Task 2's LoRA yields unexpected output shift \(B_2 A_2 h_1\).

Key Challenge: Even if the task vectors of two LoRAs are orthogonal, \(A_2 h_1\) can still be non-zero—parameter orthogonality does not equal functional orthogonality.

Goal: To ensure that the LoRA updates of task \(i\) do not interfere with the data of task \(j\).

Key Insight: Constraining the LoRA subspace before fine-tuning—ensuring the row space of \(A_i\) is orthogonal to the principal components of the data covariance matrices of other tasks.

Core Idea: Utilizing the eigenvalue decomposition of other tasks' data covariance matrices to find orthogonal subspaces for initializing the LoRA A matrix, thereby mitigating merging interference at its root.

Method¶

Overall Architecture¶

Before fine-tuning, OSRM performs the following operations for each task: (1) collects data samples from other tasks and computes the covariance matrices of hidden features at each layer; (2) performs eigenvalue decomposition on the covariance matrices and selects the eigenvectors corresponding to the smallest eigenvalues as the "orthogonal subspace"; (3) initializes the LoRA A matrix with these eigenvectors (instead of random initialization). After fine-tuning, the models can be directly merged using existing methods such as Task Arithmetic or TIES.

Key Designs¶

Data-Parameter Interference Analysis:
- Key Finding: After merging, \(W_m h_1 = W_1 h_1 + B_2 A_2 h_1\), where \(B_2 A_2 h_1\) is the interference term. To minimize this interference, \(\|A_2 h_1\|_F \approx 0\) is required.
- Under the orthogonal basis assumption, \(\|A_2 h_1\|_F\) can be minimized by making the row space of \(A_2\) orthogonal to the principal directions of \(h_1\).
Orthogonal Subspace Initialization:
- Function: Initializes the LoRA A matrix using the eigenvectors associated with the smallest eigenvalues of the other tasks' data covariance matrices.
- Mechanism: Computes the hidden feature covariance \(\Sigma = \mathbb{E}[h h^\top]\) of all other tasks' data at the target layer, performs eigenvalue decomposition \(\Sigma = U \Lambda U^\top\), and selects the eigenvectors corresponding to the smallest \(r\) eigenvalues as the initial row vectors of \(A\).
- Design Motivation: The directions of the smallest eigenvalues represent the least active directions of other tasks' data—yielding the smallest projections on these directions, thereby maximizing the reduction of merging interference.
Plug-and-play Compatibility:
- OSRM only modifies the initialization phase of LoRA, leaving the fine-tuning and merging processes unchanged.
- It can be seamlessly combined with any merging method, such as Task Arithmetic, TIES, Fisher Merging, RegMean, or EMR.

Key Experimental Results¶

Main Results¶

RoBERTa-base, 4-Task Merging (Task Arithmetic):

Method	Multi-Task Average Performance	Single-Task Preservation
Task Arithmetic (Standard LoRA init)	~65%	~85%
+ OSRM init	~75%	~87%

OSRM improves multi-task performance by approximately 10%, while maintaining or even enhancing single-task performance.

Ablation Study¶

Configuration	Effect	Description
OSRM + TA	Optimal	Full method
OSRM + TIES	Significant Improvement	Also compatible with TIES
Random init (baseline)	Baseline	Standard LoRA
Parameter Orthogonalization Only	Small Gain	Ignores data interaction, limited effectiveness

Key Findings¶

OSRM is more robust to merging hyperparameters (scaling coefficient \(\lambda\))—whereas standard methods are highly sensitive to \(\lambda\), OSRM remains stable across a wider range.
Low sample requirements—requiring only dozens of samples per other task to compute the covariance matrices.
Equally effective on large language models (LLaMA-7B).

Highlights & Insights¶

Identified the true cause of merging failure: Parameter-data interaction interference rather than parameter conflict—an insight that shifts the direction of merging research.
Addressing merging issues prior to fine-tuning: Unlike post-processing methods (which optimize merging post-fine-tuning), OSRM eliminates the sources of interference before fine-tuning, which is more fundamental.
Extremely low overhead: Requires only a one-time covariance computation and eigenvalue decomposition during initialization, with zero extra overhead during the fine-tuning process.

Limitations & Future Work¶

Requires access to other tasks' data: Needs prior knowledge of which tasks will be merged and access to a small amount of data, making it less suitable for merging completely unknown tasks.
Orthogonal basis assumption: The analysis relies on the assumption that matrix \(A\) is an orthogonal basis, which may deviate after actual fine-tuning.
Evaluated only on LoRA: Parallel analysis of merging interference in full fine-tuning models remains unexplored.

vs KnOTS: KnOTS aligns LoRA in a shared space in a data-independent manner; OSRM utilizes data-driven initialization, providing higher targeted effectiveness.
vs TIES/Task Arithmetic: These are post-processing methods, whereas OSRM is a pre-processing method—making them orthogonal and stackable.
Future work could explore replacing orthogonal initialization with regularization (continuously constraining orthogonality during the fine-tuning process).

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The data-parameter interference analysis and pre-fine-tuning orthogonal initialization provide a completely fresh perspective.
Experimental Thoroughness: ⭐⭐⭐⭐ Validated across 8 datasets and 5 models, integrated with multiple merging methods.
Writing Quality: ⭐⭐⭐⭐⭐ Thorough problem analysis with rigorous mathematical derivations.
Value: ⭐⭐⭐⭐ Provides a simple and highly effective plug-and-play solution for LoRA model merging.