Skip to content

EW-DETR: Evolving World Object Detection via Incremental Low-Rank DEtection TRansformer

Conference: CVPR2026 arXiv: 2602.20985 Code: To be confirmed Area: Object Detection Keywords: Open-world object detection, incremental learning, domain adaptation, unknown object detection, LoRA, DETR

TL;DR

This paper proposes the Evolving World Object Detection (EWOD) paradigm and the EW-DETR framework, which jointly address class-incremental learning, domain shift adaptation, and unknown object detection under a strict no-replay constraint through three synergistic modules: incremental LoRA adapters, a query-norm objectness adapter, and entropy-aware unknown mixing. The proposed approach achieves a 57.24% improvement on the FOGS metric.

Background & Motivation

Real-world deployment requirements: Scenarios such as autonomous driving and warehouse robotics require detectors to continuously recognize new object categories (e.g., novel vehicle types), adapt to changing environments (day → night → fog), and flag unseen objects as "unknown" to prevent catastrophic failures.

Limitations of existing paradigms: Open-world object detection (OWOD) assumes a single static domain and relies on exemplar replay; domain-incremental object detection (DIOD) and dual-incremental object detection (DuIOD) adopt closed-set assumptions and cannot handle unknown objects.

No-replay constraint: Privacy regulations and storage limitations make retaining past training data impractical. Existing OWOD methods (ORE, OW-DETR, CAT, PROB, OWOBJ) all depend on exemplar replay buffers and fail under strict no-replay conditions.

Coupling of domain shift and forgetting: Simultaneous evolution of the category space and visual domain shift causes drastic changes in the feature space. Standard methods either misclassify unknown objects as known categories or absorb them into the background class.

Severe data imbalance: Substantial differences in domain and category distributions across tasks lead to highly uneven sample counts, making naive adapter merging strategies unable to effectively balance stability and plasticity.

Lack of unified evaluation metrics: Existing metrics either measure only forgetting (e.g., \(\mathcal{F}_{\text{map}}\)) or focus solely on unknown detection (U-Recall), and cannot comprehensively evaluate the coupled performance across all three EWOD dimensions.

Method

Overall Architecture

EW-DETR builds upon DETR-based detectors (supporting Deformable DETR and RF-DETR). The backbone and base weights are frozen, and two sets of LoRA adapters are attached to the linear layers of the Transformer encoder–decoder. Input images are processed through the frozen backbone and the adapter-augmented encoder–decoder to produce class-agnostic query features, which are then reparameterized by the Query-Norm objectness adapter and fed into the classification head, objectness head, and localization head. The EUMix module finally fuses the outputs into calibrated detection results.

Key Design 1: Incremental LoRA Adapters

A dual-adapter architecture is adopted to enable no-replay incremental learning:

  • Aggregation adapter \(\Delta\mathbf{W}_{\text{agg}}^{t-1}\): A non-trainable buffer that accumulates compressed knowledge from all historical tasks.
  • Task-specific adapter \(\Delta\mathbf{W}_{\text{task}}^{t}\): Trainable parameters that capture category/domain changes in the current task; reset at each task transition.

Data-aware merging: A merging coefficient \(\beta_t\) is adaptively computed from the ratio of current-task sample count \(N_t\) to the cumulative historical sample count \(N_{1:t-1}\), granting greater influence to tasks with fewer samples. After merging, truncated SVD projects the result back into a low-rank space to maintain parameter efficiency:

\[\Delta\mathbf{W}_{\text{merged}}^{t} = (1-\beta_t)\Delta\mathbf{W}_{\text{agg}}^{t-1} + \beta_t\Delta\mathbf{W}_{\text{task}}^{t}\]

Key Design 2: Query-Norm Objectness Adapter

Leveraging the class-agnostic nature of DETR decoder queries, this module decouples semantic information from magnitude information:

  • Direction: Decoder features are first passed through LayerNorm and then \(\ell_2\)-normalized to obtain domain-invariant classification features \(\mathbf{h}_{\text{norm}}\), which are convex-combined with the original features via a learnable coefficient \(\alpha_{\text{mix}}\).
  • Magnitude: Queries matched to ground-truth objects empirically exhibit larger norms. The scalar norm \(\|\mathbf{h}_i\|_2\) is fed into an objectness MLP with temperature scaling to produce a class-agnostic objectness score.

Core advantage: No auxiliary losses or additional supervision are required. The objectness estimation that is robust to domain shift is learned implicitly through the standard detection loss alone.

Key Design 3: Entropy-Aware Unknown Mixing (EUMix)

This module fuses classification uncertainty and objectness evidence into a calibrated unknown score:

  • Objectness-driven unknown probability \(p_{\text{obj}}^{\text{unk}}\): High when the detector identifies an object but all known categories are uncertain.
  • Classifier-driven unknown probability \(p_{\text{cls}}^{\text{unk}}\): Derived from a learned unknown logit.

The two estimates are fused via a learnable mixing weight \(\alpha\):

\[p_{\text{final}}^{\text{unk}} = \alpha\, p_{\text{cls}}^{\text{unk}} + (1-\alpha)\, p_{\text{obj}}^{\text{unk}}\]

Soft suppression proportional to the objectness-based unknown score is simultaneously applied to the known-class logits.

Loss & Training

Standard DETR detection losses are used (Hungarian matching + classification loss + bounding box regression loss), with no additional unknown supervision losses or auxiliary losses.

Key Experimental Results

Main Results

Pascal Series: VOC→Clipart (two-stage)

Method Trainable Params (M) FSS↑ OSS↑ GSS↑ FOGS↑
ORE (CVPR'21) 5.05 0 55.48 11.37
OW-DETR (CVPR'22) 5.54 11.42 40.47 7.96
ORTH (CVPR'24) 105.9 16.59 5.83 51.06 32.44
DuET (ICCV'25) 24.22 8.47 41.05 35.49 1.46
EW-DETR (D-DETR) 0.46 25.73 64.86 61.67 7.92
EW-DETR (RF-DETR) 1.8 45.08 96.19 78.62 8.42

EW-DETR (RF-DETR) achieves an overall FOGS of 61.08, surpassing the best baseline ORTH (29.78) by 105%.

Diverse Weather multi-stage results

EW-DETR (RF-DETR) achieves the highest FOGS across all domain shift scenarios, with an average FOGS of 52.33 and consistent superiority across 10 benchmarks.

Ablation Study

Configuration FSS↑ OSS↑ GSS↑ FOGS↑
Baseline 7.52 33.78 51.49 30.87
+ Incre. LoRA 98.11 33.53 0.07 43.90
+ LoRA + QNorm-Obj 97.78 42.04 5.07 48.30
+ LoRA + QNorm-Obj + EUMix 96.19 78.62 8.42 61.08

Key Findings

  1. Incremental LoRA adapters are the core of anti-forgetting (FSS jumps from 7.52 to 98.11) while reducing trainable parameters by 94.2%, at the cost of severely sacrificing plasticity (current-task mAP drops to 0.07).
  2. QNorm-Obj partially restores open-set capability by decoupling objectness features (U-Recall improves) while maintaining high forgetting resistance.
  3. EUMix exhibits the most significant synergy with the preceding two modules, substantially improving unknown detection (OSS from 42.04 to 78.62) and also enhancing current-task generalization.
  4. t-SNE visualizations show that EW-DETR is the only method that maintains clearly separated class clusters under severe domain shift (VOC→Clipart).

Highlights & Insights

  • First to define the EWOD paradigm: Unifies incremental learning, domain adaptation, and unknown detection — more aligned with real-world deployment than OWOD/DuIOD.
  • Extreme parameter efficiency: Only 1.8M trainable parameters (vs. 105.9M for ORTH); zero-replay incremental learning achieved via dual LoRA + SVD compression.
  • Unknown detection without auxiliary losses: QNorm-Obj cleverly exploits query norms as objectness signals, detecting unknown objects without any additional supervision.
  • Proposes the FOGS composite metric: Unifies evaluation across forgetting, openness, and generalization dimensions, filling a gap in the EWOD evaluation framework.
  • Strong generalizability: The framework generalizes to different DETR variants and successfully enables state-of-the-art RF-DETR to operate in open-world settings.

Limitations & Future Work

  • Low generalization sub-score: Although FOGS is overall competitive, GSS (cross-domain generalization) remains in single digits in some scenarios, indicating that transferring new categories to old domains remains a bottleneck.
  • Validated only on DETR-based detectors: Applicability to non-Transformer detectors such as YOLO has not been explored.
  • Limited dataset scale: Pascal Series and Diverse Weather contain relatively few categories (at most 20), and performance on larger-scale scenarios (e.g., COCO-level) is unknown.
  • Simple merging coefficient design: \(\beta_t\) is based solely on sample count ratios, without considering inter-domain similarity or category difficulty.
  • No fine-grained distinction among unknown categories: All unknown objects are unified into a single "unknown" class, precluding further discovery or clustering of unknown sub-categories.
  • OWOD series: ORE → OW-DETR → CAT → PROB → ORTH → OWOBJ, all assuming a single static domain with exemplar replay.
  • Incremental detection: CIOD methods rely on knowledge distillation and replay; DIOD (LDB) learns domain biases but operates under a closed-set assumption; DuET performs dual-incremental learning via task arithmetic but lacks unknown modeling.
  • LoRA for detection: This paper is the first to apply dual-adapter LoRA with data-aware merging to incremental object detection.
  • Objectness modeling in DETR: Objectness estimation via class-agnostic decoder query properties, distinct from the probabilistic modeling approach of OWOBJ.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — Both the EWOD paradigm definition and the three-module co-design are pioneering contributions.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — 10 benchmarks, complete ablation study, and t-SNE visualizations, but large-scale dataset validation is lacking.
  • Writing Quality: ⭐⭐⭐⭐ — Problem formulation is clear, figures are well-crafted, and mathematical derivations are complete.
  • Value: ⭐⭐⭐⭐ — Addresses an important gap for real-world deployment; the FOGS metric has strong potential for broader adoption.