DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning¶
Conference: ACL 2025
arXiv: 2507.02302
Code: https://github.com/dohoonkim-ai/DoMIX
Area: Others
Keywords: domain-adaptive pretraining, LoRA, knowledge exploitation, bridge module, continual learning
TL;DR¶
This paper proposes DoMIX, which stores domain-specific knowledge in independent LoRA modules and flexibly combines them during fine-tuning using a diagonally initialized bridge matrix. Under continual domain-adaptive pretraining scenarios, it reduces pretraining time by 58% and GPU memory by 87% while outperforming state-of-the-art (SOTA) methods.
Background & Motivation¶
Background: Domain-adaptive pretraining (DAP)—pretraining on domain-specific data before fine-tuning—has proven highly effective. Continual DAP extends this to incrementally learning multiple domains. However, existing methods suffer from high computational costs, sensitivity to domain order, and the inability to selectively exploit domain knowledge.
Limitations of Prior Work: (1) Continual learning methods require complex mechanisms to prevent forgetting (e.g., EWC, experience replay), incurring high computational and memory costs. (2) Sequential training is sensitive to data ordering—shuffling orders may yield different outcomes. (3) Existing methods fuse all domain knowledge into a single model, failing to selectively exploit domain-specific knowledge for particular tasks.
Key Challenge: The essence of DAP is to provide the most suitable domain model for each task, but continual DAP provides a "general-purpose" model—which contradicts the original intent of DAP.
Goal: To design an efficient, parallelizable DAP framework capable of targeted domain knowledge exploitation.
Key Insight: Utilizing separate LoRA modules to store domain-specific knowledge (which can be trained in parallel) and automatically determining which domains to exploit—and to what extent—during fine-tuning via a learnable bridge matrix. It is observed that different domain knowledges can benefit various tasks.
Core Idea: Storing domain knowledge in independent LoRAs + utilizing a diagonal bridge matrix to control exploitation levels + freezing module A to fine-tune within the domain subspace.
Method¶
Overall Architecture¶
Three steps: (1) Perform DAP using independent LoRAs on each domain's data (fully parallelizable, with the base model frozen). (2) Concatenate the A matrices row-wise and B matrices column-wise across multiple LoRAs, inserting a diagonal bridge matrix \(P\) in between. (3) During fine-tuning, freeze module A and train \(P\) and \(B\), allowing the model to automatically select and exploit knowledge within the domain subspace.
Key Designs¶
-
Independent Parallel DAP:
- Independently train a LoRA for each domain without mutual interference, eliminating the need for forgetting-prevention mechanisms.
- Fully parallelizable: \(n\) domains can be trained simultaneously on \(n\) GPUs.
- No need to track domain IDs.
-
Bridge Matrix:
- Function: Insert a diagonal matrix \(P\) between the concatenated \(B_{cat}\) and \(A_{cat}\).
- \(\Delta W = B_{cat} P A_{cat}\), where the diagonal element \(p_{ii}\) of \(P\) controls the exploitation level of the \(i\)-th knowledge subspace.
- Initialized with a uniform distribution (\(p_{ii} = 1/(2r)\)) to ensure no bias towards any specific domain.
- Design Motivation: More parameter-efficient than a full bridge (\(P\) as a dense matrix), with the diagonal structure offering clear interpretability.
-
Freezing Module A for Fine-Tuning:
- Freezing module A restricts updates within the domain knowledge subspace \(\text{span}(A)\).
- Training \(P\) and \(B\) allows the model to learn the optimal combination of knowledge within the domain subspace.
- Design Motivation: Exploiting existing domain knowledge instead of learning from scratch.
Key Experimental Results¶
Main Results¶
| Method | Pretraining Time | GPU Memory | Avg. F1 | Order Sensitive |
|---|---|---|---|---|
| Continual DAP Method (SOTA) | 100% | 100% | Baseline | Yes |
| DoMIX | 42% (-58%) | 13% (-87%) | Outperforms SOTA | No |
| Individual DAP + LoRA | Less | Less | Comparable | No |
Ablation Study¶
| Configuration | Effect | Description |
|---|---|---|
| DoMIX (Full) | Best | bridge + freeze A |
| w/o bridge (direct concat) | Degraded | Unable to control exploitation levels |
| Full bridge (dense matrix) | Slightly better but with more parameters | Diagonal is sufficient |
| Also training A | Degraded | Deviates from domain subspace |
| LLM extension experiments | Also effective | Reduces training time by 36% and memory by 18% |
Key Findings¶
- Cross-domain knowledge transfer exists: DAP from the AI domain is also beneficial for phone tasks, validating the necessity of flexibly exploiting domain knowledge.
- Diagonal elements of the bridge matrix are interpretable: One can observe which domain knowledge contributes the most to specific tasks.
- Insensitive to data ordering: Independent parallel training eliminates sequence dependence issues.
Highlights & Insights¶
- "Decoupled knowledge accumulation and exploitation" design philosophy: Independent storage + flexible combination, which simultaneously avoids catastrophic forgetting and preserves domain specificity.
- Elegant design of the bridge matrix: A small diagonal matrix regulates multi-domain knowledge exploitation—introducing negligible parameter overhead while yielding significant performance.
Limitations & Future Work¶
- Continual DAP was evaluated primarily on RoBERTa-Base; LLM experiments remain preliminary.
- Overhead of LoRA storage and concatenation scales up as the number of domains increases.
- The diagonal assumption of the bridge matrix might be oversimplified (failing to model interactions between domains).
Related Work & Insights¶
- vs. MoE-based continual DAP (Ke et al.): These approaches employ complex forgetting-prevention mechanisms, incurring high computational costs, whereas DoMIX uses independent LoRAs + a bridge matrix, offering efficiency and simplicity.
- vs. InfLoRA: InfLoRA demonstrated the effectiveness of freezing module A for updates within a subspace; DoMIX extends this paradigm to multi-domain knowledge exploitation.
Rating¶
- Novelty: ⭐⭐⭐⭐ The combination of independent LoRA + bridge matrix is simple yet highly effective
- Experimental Thoroughness: ⭐⭐⭐ Detailed continual DAP experiments, though LLM evaluation could be more comprehensive
- Writing Quality: ⭐⭐⭐⭐ Clearly motivated, with intuitive methodology diagrams
- Value: ⭐⭐⭐⭐ Significantly enhances the efficiency of domain-adaptive pretraining