SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection¶

Conference: ICLR 2026 arXiv: 2602.03634 Code: N/A Area: Object Detection / Remote Sensing Keywords: Oriented object detection, weak supervision, sparse annotation, semi-supervised learning, remote sensing

TL;DR¶

This paper proposes the SPWOOD framework to jointly address sparse and weak annotation (HBox/Point) in oriented object detection. Through a Self-Adaptive Oriented Detector (SAOD) and a spatial layout learning strategy, SPWOOD achieves near-fully-supervised performance on the DOTA benchmark under a mixed annotation setting (RBox:HBox:Point = 1:1:1).

Background & Motivation¶

Background: Oriented object detection (OOD) is critical in remote sensing and related domains. However, annotating precise rotated bounding boxes (RBoxes) — requiring center coordinates, width, height, and rotation angle — incurs prohibitively high labeling costs.

Limitations of Prior Work: Existing methods for reducing annotation cost either handle only weak annotations (e.g., replacing RBoxes with horizontal boxes HBox or point annotations) or only sparse annotations (i.e., a subset of instances are labeled), but in practice both challenges co-exist.

Key Challenge: Sparse annotations (not all instances are labeled) and weak annotations (labeled but imprecise) each introduce severe training signal deficiency; their combination exacerbates the problem — unlabeled instances may be treated as negatives, while weak annotations may mislead angle estimation.

Goal: To train high-quality oriented object detectors under an extremely low-cost setting where both sparse and weak annotations are present simultaneously.

Key Insight: Design a unified framework that learns from three annotation types of varying quality (RBox, HBox, Point) while mining unlabeled instances via self-training.

Core Idea: A self-adaptive oriented detector unifies precise, weak, and unlabeled training signals; angular consistency constraints and spatial layout learning recover rotation information from weak annotations.

Method¶

Overall Architecture¶

SPWOOD builds upon a teacher–student self-training framework and introduces three key components: (1) SAOD, a baseline detector that handles RBoxes and pseudo-RBoxes recovered from weak annotations; (2) an angle learning module that exploits geometric consistency under image augmentation to supervise rotation estimation; and (3) a spatial layout learning strategy that recovers scale information for point-annotated instances via Voronoi–watershed analysis.

Key Designs¶

Self-Adaptive Oriented Detector (SAOD):
- Function: A unified detection baseline that handles annotations of heterogeneous quality.
- Mechanism: Standard rotated detection losses are applied to RBox annotations. For HBox annotations, feasible rotation ranges are inferred from the horizontal box to generate pseudo-RBoxes for training. For Point annotations, only center location is used, with scale recovered via spatial layout learning. The teacher model, trained on strongly annotated instances, generates pseudo-labels for unlabeled regions.
- Design Motivation: Different annotation types provide complementary information along different dimensions, necessitating differentiated rather than uniform treatment.
Angle Learning:
- Function: Recover rotation angles from HBox and Point annotations that contain no explicit angle information.
- Mechanism: Geometric consistency under image augmentation is exploited — given a rotation augmentation of angle \(\mathcal{R}\), the predicted angles must satisfy \(\theta_{rot} = \theta + \mathcal{R}\). The loss is defined as: \(\mathcal{L}_{Ang}^s = \text{SmoothL1}(\theta_{flp} - \theta_{flp}^{gt}) + \text{SmoothL1}(\theta_{rot} - \theta - \mathcal{R})\)
- Design Motivation: Rotation angle is the hardest attribute to recover from weak annotations, yet geometric transformations provide natural self-supervised signals.
Spatial Layout Learning:
- Function: Recover object width and height from point-only annotations.
- Mechanism: A Voronoi diagram is constructed from the set of point annotations to partition the image into per-point regions. Within each region, a watershed algorithm segments pixels based on appearance similarity to obtain pixel-level assignments. The watershed results are rotation-aligned to yield regression targets for width and height. The Voronoi–watershed loss is: \(\mathcal{L}_W^s = L_{GWD}(\text{pred}, \text{watershed\_target})\)
- Design Motivation: Voronoi diagrams naturally separate adjacent objects, and the watershed algorithm provides appearance-based scale estimation within each region.

Loss & Training¶

The total loss comprises: detection loss (standard rotated box regression + classification) + angle consistency loss \(\mathcal{L}_{Ang}\) + spatial layout loss \(\mathcal{L}_W\) + Gaussian overlap loss \(\mathcal{L}_O\) (penalizing overlapping predictions). The teacher–student framework is updated via Exponential Moving Average (EMA).

Key Experimental Results¶

Main Results¶

Method	Type	10% Sparse · 10% Partial	20% · 20%	30% · 20%
H2RBox-v2	Weak supervision (HBox)	30.6	42.7	49.2
MCL	Semi-supervised (RBox)	31.7	44.5	47.8
PWOOD	Partial weak supervision (RBox)	38.0	51.9	55.2
RSST	Sparse supervision (RBox)	43.4	52.3	56.6
SPWOOD (RBox)	Sparse + Weak	48.5	57.8	60.3
SPWOOD (HBox)	Sparse + Weak	45.5	54.0	56.5
SPWOOD (R:H:P=1:1:1)	Mixed	42.4	53.0	54.8

Ablation Study¶

Configuration	mAP (10%·10%)	Notes
SPWOOD (full)	48.5	All components
w/o Angle Learning	~43	Inaccurate angles under weak annotation
w/o Spatial Layout	~44	Poor scale recovery for point annotations
w/o Teacher–Student	~40	Unlabeled instances unexploited

Key Findings¶

SPWOOD (RBox) consistently outperforms all baselines across annotation ratios, with gains up to 5+ mAP.
The mixed annotation setting (R:H:P=1:1:1) achieves performance close to fully sparse RBox supervision.
Angle learning contributes most significantly in weak annotation scenarios.
Spatial layout learning becomes increasingly critical under extremely sparse settings.

Highlights & Insights¶

Unified framework for heterogeneous annotation types: SPWOOD elegantly integrates three annotation sources, each providing information of different quality and granularity.
Effective exploitation of geometric consistency: Rotation angle is self-supervised via augmentation-induced angle constraints, requiring no explicit angle labels.

Limitations & Future Work¶

The Voronoi–watershed approach may degrade in highly dense object scenes.
The angle learning module assumes that augmentation transformations are known, limiting applicability to scenarios with unknown viewpoint variations.
Evaluation is conducted exclusively on the DOTA remote sensing benchmark; generalization to natural image oriented detection remains unexplored.

vs. Point2RBox: Recovers rotated boxes from point annotations only, without addressing sparse annotation.
vs. PWOOD: Handles partial weak supervision but not sparsity — it assumes all instances carry at least weak annotations.

Rating¶

Novelty: ⭐⭐⭐⭐ First unified treatment of sparse + weak annotation for oriented object detection.
Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive comparisons across multiple annotation ratios and baselines.
Writing Quality: ⭐⭐⭐ Method descriptions are clear, though the notation is dense.
Value: ⭐⭐⭐⭐ Directly applicable to low-cost remote sensing detection pipelines.