Skip to content

✂️ Segmentation

💬 ACL2026 · 4 paper notes

AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

This paper proposes AnchorSeg, which reformulates reasoning segmentation as a structured conditional generation process based on language-grounded query banks. By explicitly decoupling spatial localization and semantic reasoning through anchor queries, paired with a Token-Mask Cyclic Consistency training objective, it achieves SOTA on ReasonSeg (67.7% gIoU, 68.1% cIoU).

BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation

BoundRL redefines structured text segmentation as a boundary generation task — generating only each segment's start tokens rather than the complete text, reducing output tokens by 90% and eliminating hallucination risk. Combined with a dual-objective reward function and selective perturbation strategy for RLVR training, a 1.7B model surpasses Claude-4 Sonnet's few-shot performance.

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

This paper proposes Hierarchical Policy Optimization (HPO), which post-trains LLM-based simultaneous speech translation models through hierarchical reward design, suppressing latency optimization when translation quality falls below threshold, achieving +7 COMET translation quality improvement at 1.5-second latency.

TemporalVLM: Video LLMs for Temporal Reasoning in Long Videos

This paper proposes TemporalVLM, which extracts local fine-grained temporal features through a time-aware segment encoder (overlapping sliding Video Q-Former + fusion module), then aggregates global long-range dependencies using BiLSTM. This is the first work to introduce LSTM into Video LLMs, outperforming prior methods on four tasks: dense video captioning, temporal grounding, highlight detection, and action segmentation.