CD-Buffer: Complementary Dual-Buffer Framework for Test-Time Adaptation in Adverse Weather Object Detection¶

Conference: CVPR 2026 arXiv: 2603.26092 Code: Website Area: Object Detection Keywords: Test-time adaptation, adverse weather, object detection, channel adaptation, additive-subtractive complementarity

TL;DR¶

This paper proposes the CD-Buffer framework, which drives complementary collaboration between a subtractive buffer (channel suppression) and an additive buffer (lightweight adapter compensation) via a unified domain discrepancy measure, enabling robust test-time object detection adaptation across adverse weather conditions of varying severity.

Background & Motivation¶

Background: Test-time adaptation (TTA) addresses domain shift by updating source-pretrained models online, without offline retraining or target labels. Existing TTA methods fall into two paradigms: additive methods (introducing lightweight modules to learn target-specific adjustments) and subtractive methods (removing domain-sensitive channels).

Limitations of Prior Work: Additive methods (e.g., BufferTTA) perform well under moderate domain shift but struggle to recover under severe degradation; subtractive methods (e.g., PruningTTA) excel under severe shift but over-prune recoverable useful information under moderate shift. Each paradigm is effective only within a limited range of conditions.

Key Challenge: In real-world scenarios, different feature channels within the same image may experience varying degrees of domain shift — some channels are severely degraded and require suppression, while others need only minor adjustment. Existing methods apply uniform treatment to all channels, failing to accommodate this heterogeneity.

Goal: Design an adaptive mechanism that automatically balances "removal" and "compensation" strategies according to the degree of domain shift in each channel.

Key Insight: Measure channel-level domain discrepancy and use a unified metric to simultaneously drive two complementary operations.

Core Idea: Discrepancy-driven dual-buffer coupling — severely shifted channels are suppressed and receive strong compensation, moderately shifted channels are finely adjusted, and stable channels are largely left unchanged.

Method¶

Overall Architecture¶

Built upon Faster R-CNN with a ResNet-50 backbone. A subtractive buffer (learnable mask scores) is placed at each BN layer, and an additive buffer (lightweight $1\times1$ + $3\times3$ convolutional adapter) is placed on the residual path. Both buffers are coupled through a unified channel discrepancy measure $D_c$.

Key Designs¶

Feature-Level Domain Discrepancy: Combines image-level and instance-level feature discrepancies: $D^I = \frac{\sum_{n=1}^{N}\|X_t^c - \bar{X}_s^c\|_1}{NHW}, \quad D^O = \frac{\sum_{m=1}^{M}\|x_t^c - \bar{x}_s^c\|_1}{Mhw}$ $D = D^I + D^O$ where $X_s$ denotes source-domain precomputed mean features and $x_s$ denotes instance-level (via RoI Align) mean features. Design Motivation: Object detection requires simultaneous consideration of global scene and local instance domain shift. High $D$ indicates severe channel deviation requiring intervention; low $D$ indicates that only fine-tuning is needed.
Subtractive Buffer: Introduces learnable mask scores $s \in \mathbb{R}^C$ (initialized from BN weights as $s_c = |\gamma_c|$), driving channel suppression via discrepancy-weighted regularization: $\mathcal{L}_{mask} = \frac{1}{C}\sum_c \|D_c \cdot s_c\|_1$ Channels with high discrepancy have large $D_c$, producing stronger gradients that push $s_c$ down and cause those channels to be suppressed by a thresholded mask. A dynamic percentile threshold $\tau = \text{Percentile}(\{|s_c^{(l)}|\}, \rho_{target})$ controls the overall suppression rate (5%), with a straight-through estimator to ensure gradient flow. Stochastic reactivation prevents the permanent removal of useful channels.
Additive Buffer: A lightweight adapter $F_{add} = \frac{\text{Conv}_{1\times1}(F) + \text{Conv}_{3\times3}(F)}{2} \odot \boldsymbol{\alpha}$, with a learnable channel scaling factor $\boldsymbol{\alpha}$ (initialized to $10^{-2}$). The key innovation is the inverse soft mask: $\hat{m}_{soft}^{-1} = k \cdot \text{Norm}(\mathbf{1} - \hat{m}_{soft})$ Channels strongly suppressed by the subtractive buffer (where $\hat{m}_{soft} \approx 0$) receive the strongest additive compensation (maximum $\hat{m}_{soft}^{-1}$), achieving automatic balance of "compensate as much as is removed." Design Motivation: While the subtractive buffer removes severely degraded features, it inevitably discards information; the inverse modulation of the additive buffer automatically provides the strongest compensation for precisely those channels.

Loss & Training¶

$$\mathcal{L} = \mathcal{L}_{align} + \lambda_{reg} \cdot \mathcal{L}_{mask}$$ - $\mathcal{L}_{align}$: L1 alignment of source and target feature mean and variance - Layer-wise gradient scaling: additive buffer gradients are amplified according to layer discrepancy $D^l$ - Joint optimization: additive buffer parameters, BN affine parameters, and mask scores

Key Experimental Results¶

Main Results¶

Method	KITTI Fog 50m	Fog 75m	Fog 150m	Rain 200mm	Rain 100mm
Direct Test	21.27	30.84	50.45	47.11	65.32
BufferTTA (Additive)	23.21	33.12	52.50	50.96	69.28
PruningTTA (Subtractive)	33.97	42.83	58.58	50.94	65.42
ActMAD	39.65	49.95	60.18	56.37	62.94
CD-Buffer	44.80	56.06	68.42	63.22	71.40

Method	ACDC Fog	ACDC Snow	ACDC Rain	ACDC Night
Direct Test	16.50	11.04	7.82	4.83
BufferTTA	24.16	17.18	11.70	6.98
CD-Buffer	24.45	15.41	13.71	8.92

Ablation Study¶

Configuration	KITTI Fog 50m	Note
Additive buffer only	~23.2	Limited effectiveness under severe domain shift
Subtractive buffer only	~34.0	Removes degraded features but loses information
Dual buffer without coupling	Significantly below full method	Independent operations lack coordination
Full CD-Buffer	44.80	Discrepancy-driven coupling is optimal

Key Findings¶

Complementarity Validation: BufferTTA performs well under moderate shift (Rain 75mm) but poorly under severe shift (Fog 50m); PruningTTA exhibits the opposite pattern. CD-Buffer consistently outperforms across all severity levels.
Continual TTA Stability: In continual adaptation experiments on KITTI Fog 50m→75m→150m, CD-Buffer adapts fastest and remains the most stable. ActMAD achieves rapid initial improvement but exhibits unstable convergence.
The inverse soft mask coupling mechanism is the critical factor behind the performance gains — unifying two paradigms from independent strategies into a coordinated system.

Highlights & Insights¶

This work is the first to systematically reveal the complementary nature of additive and subtractive TTA paradigms, providing an intuitive explanatory framework.
The discrepancy-driven coupling design is elegant: a single measure $D_c$ simultaneously drives two opposing operations, automatically achieving channel-level differentiated processing.
The inverse soft mask is a highlight design: compensation intensity is derived directly from the subtractive buffer's mask scores, requiring no additional network.
Batch-size independence: unlike BN statistics-based methods that are sensitive to small batches, CD-Buffer achieves adaptation through structural modifications.

Limitations & Future Work¶

Experiments are conducted only on Faster R-CNN + ResNet-50; applicability to end-to-end detectors such as DETR-based architectures has not been verified.
The channel suppression rate is fixed at 5%; adaptive determination could be explored.
Source-domain feature statistics must be precomputed and stored, increasing deployment complexity.
Integration with self-supervised objectives (e.g., contrastive learning) has not been explored.

Unlike ActMAD's multi-layer feature alignment, CD-Buffer provides finer-grained adaptation through channel-level differentiated processing.
The inverse mask concept is generalizable to other scenarios requiring a balance between preservation and modification (e.g., knowledge distillation in model compression).
This work contributes an "additive vs. subtractive" taxonomic framework to the TTA field, which may facilitate understanding of future work.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The additive/subtractive complementarity insight is novel; the discrepancy-driven coupling mechanism is elegantly designed
Experimental Thoroughness: ⭐⭐⭐⭐ Multi-dataset, multi-severity evaluation is comprehensive; continual TTA experiments are meaningful
Writing Quality: ⭐⭐⭐⭐ Motivation is clear and method description is complete
Value: ⭐⭐⭐⭐ Offers a new paradigm integration perspective for TTA