🎯 Object Detection¶

🔬 ICLR2026 · 9 paper notes

AdaRank: Adaptive Rank Pruning for Enhanced Model Merging: AdaRank is proposed to adaptively select singular components of task vectors via learnable binary masks (replacing heuristic top-k selection), combined with test-time entropy minimization, substantially alleviating inter-task interference in multi-task model merging and achieving 89.4% accuracy on ViT-B/32.
CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection: This paper is the first to introduce Object-Centric Learning (Slot Attention) into Source-Free Domain-Adaptive Object Detection (SF-DAOD). It extracts domain-invariant object-level structural priors via a hierarchical slot-aware module and drives domain-invariant representations through class-guided contrastive learning, achieving substantial improvements over existing methods across multiple cross-domain benchmarks.
CORDS: Continuous Representations of Discrete Structures: CORDS is a framework that bijectively maps variable-size discrete sets (detection boxes, molecular atoms) to continuous density and feature fields, enabling models to learn in field space and decode back to discrete sets exactly — without the constraints of fixed slots or padding.
ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection: ForestPersons is the first large-scale benchmark dataset specifically designed for under-canopy missing person detection in forest environments (96,482 images + 204,078 annotations). By simulating the low-altitude flight perspective of micro aerial vehicles (MAVs) at 1.5–2.0 meters, the dataset covers multi-season, multi-weather, multi-pose, and multi-occlusion-level conditions representative of real search-and-rescue (SAR) scenarios, providing a solid foundation for training and evaluating under-canopy person detection models.
FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion: This paper proposes a training-free few-shot object detection framework that combines three foundation models—UPN, SAM2, and DINOv2—for proposal generation and feature matching, and employs a graph diffusion algorithm to refine confidence scores and suppress fragmented proposals. The method achieves substantial improvements over prior state-of-the-art on Pascal-5i and COCO-20i.
InfoDet: A Dataset for Infographic Element Detection: This paper introduces a large-scale infographic element detection dataset (101,264 infographics, 14.2 million annotations) spanning two major categories—chart elements and human-recognizable objects (HROs)—and proposes a Grounded CoT method that leverages detection results to enhance VLM chart understanding.
Long-Context Generalization with Sparse Attention: This paper proposes ASEntmax (Adaptive-Scalable Entmax), which replaces softmax attention with α-entmax equipped with a learnable temperature. Through both theoretical analysis and empirical evaluation, it demonstrates that sparse attention enables up to 1000× length extrapolation, addressing the attention dispersion problem of softmax under long-context settings.
SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection: This paper proposes the SPWOOD framework to jointly address sparse and weak annotation (HBox/Point) in oriented object detection. Through a Self-Adaptive Oriented Detector (SAOD) and a spatial layout learning strategy, SPWOOD achieves near-fully-supervised performance on the DOTA benchmark under a mixed annotation setting (RBox:HBox:Point = 1:1:1).
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Method: This paper proposes TreeBench (the first traceable visual reasoning benchmark comprising 405 highly challenging VQA pairs, on which OpenAI-o3 achieves only 54.87%) and TreeVGR (a training paradigm that jointly supervises grounding and reasoning via dual IoU reward-based reinforcement learning). A 7B model achieves gains of +16.8 on V*Bench, +12.6 on MME-RealWorld, and +13.4 on TreeBench, demonstrating that traceability is a key driver of visual reasoning advancement.