Efficient Test-Time Adaptive Object Detection via Sensitivity-Guided Pruning¶
Conference: CVPR 2025
arXiv: 2506.02462
Code: None
Area: Object Detection / Domain Adaptation
Keywords: Continual Test-Time Adaptation, Channel Pruning, Sensitivity Metric, Domain Shift, Efficient Inference
TL;DR¶
Proposes an efficient continual test-time adaptive object detection (CTTA-OD) method, identifying that certain feature channels in the source model are sensitive to domain shifts and impede cross-domain performance. Selective pruning is achieved by guiding weighted sparse regularization with channel sensitivity measured at both image and instance levels, complemented by a random channel reactivation mechanism to prevent erroneous pruning. This approach surpasses SOTA adaptation accuracy while reducing computational cost by 12%.
Background & Motivation¶
Background: Object detection faces domain shift challenges in practical deployments, where differences between training data and testing environments (weather, lighting, etc.) lead to performance degradation. Continual Test-Time Adaptation (CTTA) aims to adapt to continuously changing target domains online during inference without accessing source data. For object detection, methods like STFAR and ActMAD have achieved promising results through pseudo-label self-training or feature distribution alignment.
Limitations of Prior Work: Existing CTTA-OD methods focus almost exclusively on adaptation accuracy while ignoring computational efficiency, which is a severe practical constraint for resource-constrained scenarios such as autonomous driving and UAVs. More crucially, these methods adapt all source domain features indiscriminately. However, experiments reveal the existence of "harmful channels" in the source model—certain channels contribute positively within the source domain (removal degrades source domain performance) but have a negative impact on cross-domain performance (removal actually yields better cross-domain performance).
Key Challenge: Indiscriminate all-channel adaptation not only wastes computational resources on "harmful channels" sensitive to domain shifts but also increases adaptation difficulty. These sensitive channels consume computational budget while dragging down cross-domain performance.
Goal: How to identify and prune feature channels that are sensitive to domain shifts, concentrating adaptation efforts on domain-invariant features to simultaneously improve both adaptation performance and computational efficiency?
Key Insight: The authors perform channel-wise ablation experiments on the source model and discover a class of "red dot" channels: removing them degrades source domain performance but improves cross-domain performance. This indicates that these channels encode domain-specific information and introduce negative interference during cross-domain scenarios. This inspires the use of network pruning—not for model compression, but to "remove noise sources of domain shift".
Core Idea: Use sensitivity metrics to identify and prune the channels most sensitive to domain shifts, while retaining domain-invariant channels for adaptation, achieving "subtraction for efficiency and effectiveness".
Method¶
Overall Architecture¶
Powered by ResNet+Faster R-CNN as the base detector. During continual test-time: (1) before each optimization step, dynamically determine which channels to prune based on whether the BN scale factor \(\gamma\) is below a threshold \(t\), constructing the sub-network of the current step; (2) perform forward propagation for prediction; (3) if the pruning ratio \(\rho < p\) (threshold), optimize both the adaptation loss and the weighted sparse regularization loss to continue pruning; otherwise, optimize only the adaptation loss and randomly reactivate pruned channels with a certain probability; (4) perform backpropagation to update parameters. The core is utilizing the BN scaling factor \(\gamma\) as the channel switch.
Key Designs¶
-
Sensitivity-Guided Channel Pruning:
- Function: Quantify the sensitivity of each feature channel to domain shifts to guide selective pruning.
- Mechanism: The sensitivity weight is defined as \(\omega = \omega_{img} + \omega_{ins}\), consisting of image-level and instance-level parts. Image-level sensitivity \(S_{img}\) calculates the \(L_1\) distance between the current target domain feature map and the pre-stored average feature map of the source domain, aggregated along the channel dimension: \(S_{img} = \frac{1}{ND} \sum_{n=1}^N \|F_t^n - \bar{F_s}\|_1\). Instance-level sensitivity \(S_{ins}\) performs the same distance calculation on the RoI regions predicted by the detector, capturing domain shifts at the foreground object level. They are normalized individually and serve as weights for the weighted sparse regularization: \(\mathcal{L}_{wreg} = \sum_i \|\omega_i \cdot \gamma_i\|_1\).
- Design Motivation: Image-level sensitivity reflects global statistical shifts (e.g., overall color tone changes), while instance-level sensitivity reflects local shifts in foreground objects (e.g., object blurring caused by fog). Combining these two granularities ensures sensitive channels are not overlooked.
-
BN Scaling Factor-Based Channel Pruning Mechanism:
- Function: Achieve structural channel pruning via the learnable parameter \(\gamma\) of BN layers.
- Mechanism: The BN scaling factor \(\gamma\) directly controls the output amplitude of each channel. Applying weighted L1 regularization on \(\gamma\) drives the \(\gamma\) of sensitive channels toward zero. When \(\gamma_i < t\) (threshold), the channel and its corresponding convolutional filters are pruned. Due to the dual input-output sparsity of convolutions between adjacent BN layers in ResNet, the reduction in computational cost scales quadratically with the pruning ratio—retaining half of the channels reduces intermediate convolutional computation to 1/4.
- Design Motivation: Utilizing existing BN parameters avoids introducing extra parameters; structural pruning yields real-world speedup without requiring specialized hardware.
-
Random Channel Reactivation:
- Function: Prevent early wrong pruning or the permanent loss of useful channels caused by domain changes.
- Mechanism: When the pruning ratio \(\rho\) exceeds a predefined threshold \(p\), Bernoulli sampling is performed with probability \(r\) for each pruned channel: \(b_i^j \sim \text{Bernoulli}(r)\). If sampled as 1, the corresponding channel's \(\gamma\) is reset to its source pre-trained value, re-enabling its participation in forward and backward propagation. The model can then re-evaluate the utility of these channels in the current target domain.
- Design Motivation: In CTTA scenarios, target domains continuously change; a channel that is useless in the current domain might become useful in future domains. Reactivation provides error-correction opportunities, avoiding irreversible information loss.
Loss & Training¶
The total loss is: \(\mathcal{L}_{total} = \mathcal{L}_{adp} + \lambda \mathcal{L}_{wreg}\) (when \(\rho < p\)), or \(\mathcal{L}_{total} = \mathcal{L}_{adp}\) (pruning stops when \(\rho \geq p\)). The adaptation loss \(\mathcal{L}_{adp} = \mathcal{L}_{img} + \mathcal{L}_{ins}\) is based on KL-divergence alignment at both image and instance levels. Instance-level alignment introduces class-aware weights \(w_k\) to dynamically adjust the alignment intensity for rare classes.
Key Experimental Results¶
Main Results¶
Cityscapes \(\rightarrow\) Cityscapes-C Continual Adaptive Detection (10-round average mAP and FLOPs):
| Method | Avg mAP | Forward FLOPs | Backward FLOPs | Total FLOPs |
|---|---|---|---|---|
| Direct Test | 4.5 | 250.7 | 0.0 | 250.7 |
| STFAR | 7.6 | 501.4 | 501.4 | 1002.9 |
| ActMAD | 8.9 | 250.7 | 501.4 | 752.2 |
| WHW | 9.1 | 253.0 | 255.2 | 508.2 |
| Ours | 11.4 | 224.6 | 225.0 | 449.6 |
The proposed method outperforms all comparison methods in adaptation accuracy (2.3 mAP improvement over WHW) while maintaining the lowest total FLOPs (40% reduction compared to ActMAD, 12% reduction compared to WHW).
Ablation Study¶
| Configuration | Avg mAP | Total FLOPs | Description |
|---|---|---|---|
| Full model | 11.4 | 449.6 | Full method |
| Uniform pruning (w/o sensitivity guidance) | 9.8 | 450.1 | Pruning without distinguishing channel sensitivity |
| Image-level sensitivity only | 10.6 | 449.9 | Missing instance-level information |
| Instance-level sensitivity only | 10.3 | 450.0 | Missing global statistical information |
| w/o channel reactivation | 10.8 | 441.2 | Lower computation but accuracy degrades |
Key Findings¶
- Sensitivity-guided vs. Uniform pruning: Guidance yields a 1.6 mAP improvement (11.4 vs 9.8), demonstrating that "what to prune" is more important than "how much to prune".
- Dual-granularity of image-level + instance-level is superior to either single granularity, as they provide complementary domain shift information.
- Channel reactivation contributes 0.6 mAP (10.8 \(\rightarrow\) 11.4); although it introduces a minor computational overhead, it effectively avoids erroneous pruning.
- Consistent advantages are also validated on UAVDT \(\rightarrow\) UAVDT-C (UAV scenario), proving the cross-scene generalization of the method.
- Quadratic efficiency gain from pruning ratio: retaining 75% of channels reduces the computation of certain conv layers to ~56%.
Highlights & Insights¶
- Counter-intuitive finding that "subtraction is addition": It is commonly believed that more feature channels yield better performance, but this paper demonstrates that removing sensitive channels actually improves cross-domain accuracy under domain shifts. This offers a brand-new perspective for TTA research—do not try to adapt all features; instead, identify and prune the interferers.
- Pruning for adaptation (rather than compression): Traditional pruning aims to compress models. This work redefines pruning as "removing noise sources of domain shift", which is a concept that can be transferred to domain generalization, out-of-distribution detection, etc.
- Clever random reactivation mechanism: Achieving tentative recovery of pruned channels with low-cost Bernoulli sampling presents an effective solution to handle pruning decision uncertainty under non-stationary distributions.
Limitations & Future Work¶
- Relies on BN layers as the pruning medium, offering limited applicability to architectures without BN (such as ViT or LayerNorm models).
- Sensitivity measurement requires pre-stored source domain feature statistics (mean and variance), increasing pre-deployment preparation.
- Currently validated only on Better R-CNN; applicability to single-stage detectors (YOLO series) and Transformer detectors remains to be verified.
- The pruning threshold \(t\) and pruning ratio threshold \(p\) are manually set hyperparameters; adaptively determining these thresholds may further improve performance.
Related Work & Insights¶
- vs ActMAD: ActMAD performs fine-grained activation statistics alignment without distinguishing channel importance. This work directly removes interfering channels via sensitivity pruning, simultaneously reducing computation and improving accuracy.
- vs STFAR: STFAR requires double forward and backward propagation overhead (pseudo-label generation + self-training), while this method reduces computation in both directions via pruning.
- vs WHW: WHW has comparable adaptation accuracy but higher computational cost (508 vs 450 FLOPs), and it does not consider which feature channels should be retained.
- The concept of sensitivity-guided pruning may inspire solutions to catastrophic forgetting in continual learning or incremental learning by identifying and protecting channels essential to old tasks.
Rating¶
- Novelty: ⭐⭐⭐⭐ "Pruning for adaptation" offers a novel perspective; the sensitivity metric is logically designed.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Evaluated on three benchmarks + multi-round continual adaptation + detailed ablation + comprehensive FLOPs analysis.
- Writing Quality: ⭐⭐⭐⭐ Clear motivation diagram (Fig. 2) and method pipeline make it easy to grasp.
- Value: ⭐⭐⭐⭐ Simultaneous gains in efficiency and accuracy carry direct significance for real-world deployment.