CVPR 2026 Object Detection Open-vocabulary detection domain adaptation weight-space transfer SVD rotation alignment Objectification SVFT

ABRA: Teleporting Fine-Tuned Knowledge Across Domains for Open-Vocabulary Object Detection¶

Conference: CVPR 2026 arXiv: 2603.12409 Code: To be released Area: Object Detection / Domain Adaptation / Open-Vocabulary Keywords: Open-vocabulary detection, domain adaptation, weight-space transfer, SVD rotation alignment, Objectification, SVFT

TL;DR¶

This paper formulates cross-domain category transfer as SVD rotation alignment in weight space: domain-agnostic experts are trained via Objectification, lightweight class residuals are extracted with SVFT, and a closed-form orthogonal Procrustes solution is used to "teleport" source-domain class knowledge to a target domain with no data for that class.

Background & Motivation¶

Background: Open-vocabulary detectors (e.g., Grounding DINO) can detect arbitrary categories via text prompts, but suffer severe performance degradation under domain shift (night/fog/rain). Conventional DAOD methods rely on Mean Teacher with pseudo-labels, which become unreliable under large domain shifts.

Limitations of Prior Work:

Standard DAOD assumes target-domain images exist for all categories (even if unannotated), whereas in practice rare categories may have no target-domain data whatsoever—neither annotations nor images.
Weight-space methods such as Task Arithmetic ignore the rotational difference between source and target SVD subspaces, making naive residual addition/subtraction ineffective.
There is no existing method to transfer a category across domains when that category is completely absent from the target domain.

Key Challenge: The target domain must detect a certain category, yet that category is entirely invisible in the target domain (zero-shot class transfer across domains).

Goal: Transfer the category-detection capability learned in the source domain to a target domain that contains no data for that category.

Key Insight: Domain knowledge and class knowledge are disentanglable—domain experts capture visual statistics (illumination/texture/weather), while class experts capture category semantics. By aligning the SVD bases of the two domains, class residuals can be "teleported" across domains.

Core Idea: \(\theta_T^{(c)} \approx U_T(\Sigma_T + U_T^\top U_S \cdot \Delta\Sigma_S^{(c)} \cdot V_S^\top V_T) V_T^\top\), yielding a closed-form solution that requires no training.

Method¶

Overall Architecture¶

Grounding DINO pretrained weights \(\theta_0\) → Objectification training of source/target domain experts \(\theta_S, \theta_T\) → SVFT training of lightweight class residuals \(\Delta\Sigma_S^{(c)}\) on the source domain → orthogonal Procrustes closed-form teleportation to the target domain → target-domain class expert \(\hat{\theta}_T^{(c)}\).

Key Designs¶

Objectification
- Annotations for the top-3 most frequent categories are replaced with a unified "object" label, training a class-agnostic domain expert.
- The model is forced to learn domain visual statistics (illumination patterns, texture features, weather conditions) rather than category semantics.
- One domain expert is trained per domain: \(\theta_S = \text{Fine-Tune}(\theta_0, \tilde{\mathcal{D}}_S)\).
- Annotations for non-top-3 categories are discarded to prevent low-frequency classes from introducing class bias.
SVFT Class Expert Training
- SVD decomposition is applied to the domain expert: \(\theta_{S,\ell} = U_{S,\ell} \Sigma_{S,\ell} V_{S,\ell}^\top\).
- \(U\), \(\Sigma\), and \(V\) are frozen; only the extremely lightweight singular-value residual \(\Delta\Sigma_{S,\ell}^{(c)}\) (diagonal or banded matrix) is trained.
- Forward pass: \(f_\ell(x) = U_{S,\ell}(\Sigma_{S,\ell} + \Delta\Sigma_{S,\ell}^{(c)}) V_{S,\ell}^\top x\).
- Each class expert requires only a small amount of data and few parameters to train.
- During training, only images containing the target category are retained, and annotations for other categories are masked.
Orthogonal Procrustes Teleportation
- Core operation: rotate the source class residual into the target-domain SVD basis: \(\pi_{S \rightarrow T}(\Delta\Sigma_S^{(c)}) = L \Delta\Sigma_S^{(c)} R^\top\).
- The orthogonal Procrustes problem is solved as: \(L^* = U_T^\top U_S\), \(R^* = V_T^\top V_S\).
- The resulting target-domain class expert is: \(\theta_{T,\ell}^{(c)} = U_T(\Sigma_T + U_T^\top U_S \Delta\Sigma_S^{(c)} V_S^\top V_T) V_T^\top\).
- The solution is entirely closed-form, requiring no training iterations or target-domain class data.

Loss & Training¶

Domain experts: encoder attention layers, 10 epochs, lr=1e-4, batch=2.
Class experts: SVFT, 12 epochs, lr=1e-2, batch=4.
Teleportation stage: no training; pure matrix operations.

Key Experimental Results¶

Main Results¶

Cityscapes → Foggy Cityscapes (average over 5 unseen classes)

Method	mAP	AP50
Zero shot	27.66	44.12
Source (direct evaluation of source-domain class expert)	38.25	57.34
Task Analogy	18.12	26.79
ParamΔ	28.29	44.42
ABRA	40.54	61.06
Fine-tuning (upper bound)	41.36	62.48

ABRA achieves on average 98% of the fine-tuning upper bound across the 5 transferred classes.

SDGOD — four domain shifts (average)

Method	mAP	AP50
Zero shot	20.65	34.82
Source	27.76	48.99
ParamΔ	8.87	13.85
ABRA	28.10	50.57
Fine-tuning	29.20	51.93

ParamΔ collapses on SDGOD (the identity mapping fails under aggressive domain shift), while ABRA remains robust.

Ablation Study¶

Ablation	mAP
Zero Shot w/ Obj. (merged detection boxes)	36.20
Supervised (retaining semantic labels)	38.88
Objectification	40.54

Objectification outperforms retaining original semantic labels, confirming that class-agnostic training is critical for cross-domain transfer.
Fine-tuning and FDA initialized with ABRA both outperform standard \(\theta_0\) initialization (FFT: 42.80 vs. 41.36).

Key Findings¶

The failure of Task Analogy and ParamΔ demonstrates that naive weight addition/subtraction is insufficient; basis alignment is necessary.
ABRA consistently outperforms standard pretraining as an initialization in few-shot settings.
Training independent class experts per category outperforms a single expert trained on all categories jointly.
ABRA remains competitive on Night Rainy, the most challenging domain.

Highlights & Insights¶

SVD rotation alignment in weight space offers an elegant domain adaptation paradigm that completely bypasses feature alignment and adversarial training.
Objectification is a clever class–domain disentanglement strategy: assigning a unified "object" label strips away class semantics.
SVFT residuals are extremely lightweight (diagonal matrices only), enabling parallel training and efficient storage across multiple classes.
The closed-form solution yields zero-latency teleportation, making it deployment-friendly for rapid adaptation to new domains.

Limitations & Future Work¶

The orthogonal Procrustes assumption requires well-corresponding SVD subspaces between source and target domains; validity under extreme domain shifts (e.g., CT → ultrasound) remains to be verified.
Objectification uses only the top-3 categories; the optimal number of categories may vary across datasets.
Experiments are conducted solely on Grounding DINO; generalization to other OVD architectures (e.g., YOLO-World) requires further validation.
Comparisons with recent multi-source domain adaptation methods are absent.

vs. Task Arithmetic: Ignoring SVD rotational differences causes transfer failure (mAP drops from 40.54 to 18.12).
vs. ParamΔ: The identity-mapping assumption collapses on SDGOD (mAP drops to 8.87), demonstrating that rotation alignment cannot be omitted.
vs. Mean Teacher DAOD: Mean Teacher requires target-domain images, whereas ABRA requires no target-domain class data whatsoever.
vs. Model Rebasin: Shares a similar conceptual basis, but ABRA specializes it for detection domain adaptation and introduces Objectification.
Insight: Weight-space operations and SVFT are broadly applicable to cross-domain deployment of segmentation and classification models.

Rating¶

Novelty: ⭐⭐⭐⭐ Modeling domain adaptation as weight rotation is a novel perspective; the Objectification design is ingenious.
Experimental Thoroughness: ⭐⭐⭐⭐ Multiple domain-shift scenarios, comprehensive ablations, and few-shot analysis.
Writing Quality: ⭐⭐⭐⭐ Clear problem formulation and methodological derivation with consistent mathematical notation.
Value: ⭐⭐⭐⭐ High practical value for weight-space transfer; the closed-form solution is deployment-friendly.