Skip to content

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging

Conference: ACL 2025
arXiv: 2505.22934
Code: None
Area: Model Compression
Keywords: LoRA merging, model merging, orthogonal subspace, parameter-data interaction, task interference

TL;DR

OSRM identifies that the failure of LoRA model merging stems from interaction interference between parameters and data distributions (rather than merely parameter conflicts). It proposes initializing the LoRA A matrix prior to fine-tuning via eigenvalue decomposition of the data covariance matrix, making its subspace orthogonal to the data distributions of other tasks. This minimizes cross-task interference during merging, significantly improving merging performance across 8 datasets and 5 models.

Background & Motivation

Background: Model merging (such as Task Arithmetic, TIES, etc.) can combine multiple task models into one without retraining, but performs poorly on LoRA fine-tuned models—resulting in severe performance degradation.

Limitations of Prior Work: Existing methods (such as KnOTS and parameter orthogonalization) only focus on downstream alignment or decoupling of parameters while ignoring how parameters interact with input data. After merging, data from Task 1 passing through Task 2's LoRA yields unexpected output shift \(B_2 A_2 h_1\).

Key Challenge: Even if the task vectors of two LoRAs are orthogonal, \(A_2 h_1\) can still be non-zero—parameter orthogonality does not equal functional orthogonality.

Goal: To ensure that the LoRA updates of task \(i\) do not interfere with the data of task \(j\).

Key Insight: Constraining the LoRA subspace before fine-tuning—ensuring the row space of \(A_i\) is orthogonal to the principal components of the data covariance matrices of other tasks.

Core Idea: Utilizing the eigenvalue decomposition of other tasks' data covariance matrices to find orthogonal subspaces for initializing the LoRA A matrix, thereby mitigating merging interference at its root.

Method

Overall Architecture

Before fine-tuning, OSRM performs the following operations for each task: (1) collects data samples from other tasks and computes the covariance matrices of hidden features at each layer; (2) performs eigenvalue decomposition on the covariance matrices and selects the eigenvectors corresponding to the smallest eigenvalues as the "orthogonal subspace"; (3) initializes the LoRA A matrix with these eigenvectors (instead of random initialization). After fine-tuning, the models can be directly merged using existing methods such as Task Arithmetic or TIES.

Key Designs

  1. Data-Parameter Interference Analysis:

    • Key Finding: After merging, \(W_m h_1 = W_1 h_1 + B_2 A_2 h_1\), where \(B_2 A_2 h_1\) is the interference term. To minimize this interference, \(\|A_2 h_1\|_F \approx 0\) is required.
    • Under the orthogonal basis assumption, \(\|A_2 h_1\|_F\) can be minimized by making the row space of \(A_2\) orthogonal to the principal directions of \(h_1\).
  2. Orthogonal Subspace Initialization:

    • Function: Initializes the LoRA A matrix using the eigenvectors associated with the smallest eigenvalues of the other tasks' data covariance matrices.
    • Mechanism: Computes the hidden feature covariance \(\Sigma = \mathbb{E}[h h^\top]\) of all other tasks' data at the target layer, performs eigenvalue decomposition \(\Sigma = U \Lambda U^\top\), and selects the eigenvectors corresponding to the smallest \(r\) eigenvalues as the initial row vectors of \(A\).
    • Design Motivation: The directions of the smallest eigenvalues represent the least active directions of other tasks' data—yielding the smallest projections on these directions, thereby maximizing the reduction of merging interference.
  3. Plug-and-play Compatibility:

    • OSRM only modifies the initialization phase of LoRA, leaving the fine-tuning and merging processes unchanged.
    • It can be seamlessly combined with any merging method, such as Task Arithmetic, TIES, Fisher Merging, RegMean, or EMR.

Key Experimental Results

Main Results

RoBERTa-base, 4-Task Merging (Task Arithmetic):

Method Multi-Task Average Performance Single-Task Preservation
Task Arithmetic (Standard LoRA init) ~65% ~85%
+ OSRM init ~75% ~87%

OSRM improves multi-task performance by approximately 10%, while maintaining or even enhancing single-task performance.

Ablation Study

Configuration Effect Description
OSRM + TA Optimal Full method
OSRM + TIES Significant Improvement Also compatible with TIES
Random init (baseline) Baseline Standard LoRA
Parameter Orthogonalization Only Small Gain Ignores data interaction, limited effectiveness

Key Findings

  • OSRM is more robust to merging hyperparameters (scaling coefficient \(\lambda\))—whereas standard methods are highly sensitive to \(\lambda\), OSRM remains stable across a wider range.
  • Low sample requirements—requiring only dozens of samples per other task to compute the covariance matrices.
  • Equally effective on large language models (LLaMA-7B).

Highlights & Insights

  • Identified the true cause of merging failure: Parameter-data interaction interference rather than parameter conflict—an insight that shifts the direction of merging research.
  • Addressing merging issues prior to fine-tuning: Unlike post-processing methods (which optimize merging post-fine-tuning), OSRM eliminates the sources of interference before fine-tuning, which is more fundamental.
  • Extremely low overhead: Requires only a one-time covariance computation and eigenvalue decomposition during initialization, with zero extra overhead during the fine-tuning process.

Limitations & Future Work

  • Requires access to other tasks' data: Needs prior knowledge of which tasks will be merged and access to a small amount of data, making it less suitable for merging completely unknown tasks.
  • Orthogonal basis assumption: The analysis relies on the assumption that matrix \(A\) is an orthogonal basis, which may deviate after actual fine-tuning.
  • Evaluated only on LoRA: Parallel analysis of merging interference in full fine-tuning models remains unexplored.
  • vs KnOTS: KnOTS aligns LoRA in a shared space in a data-independent manner; OSRM utilizes data-driven initialization, providing higher targeted effectiveness.
  • vs TIES/Task Arithmetic: These are post-processing methods, whereas OSRM is a pre-processing method—making them orthogonal and stackable.
  • Future work could explore replacing orthogonal initialization with regularization (continuously constraining orthogonality during the fine-tuning process).

Rating

  • Novelty: ⭐⭐⭐⭐⭐ The data-parameter interference analysis and pre-fine-tuning orthogonal initialization provide a completely fresh perspective.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Validated across 8 datasets and 5 models, integrated with multiple merging methods.
  • Writing Quality: ⭐⭐⭐⭐⭐ Thorough problem analysis with rigorous mathematical derivations.
  • Value: ⭐⭐⭐⭐ Provides a simple and highly effective plug-and-play solution for LoRA model merging.