🎮 Reinforcement Learning¶

🎞️ ECCV2024 · 3 paper notes

AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale: This paper proposes AdaGlimpse, which utilizes Soft Actor-Critic (SAC) reinforcement learning to select glimpses of arbitrary positions and scales from a continuous action space. Combined with a ViT encoder equipped with elastic positional encoding, it achieves multi-task active visual exploration (reconstruction, classification, and segmentation), outperforming state-of-the-art methods that use 18% of pixels, while requiring only 6% of pixels.
Octopus: Embodied Vision-Language Programmer from Environmental Feedback: This paper proposes Octopus, an embodied vision-language programming model that bridges high-level planning and low-level manipulation by generating executable code. It introduces a Reinforcement Learning with Environmental Feedback (RLEF) training scheme to enhance decision-making quality.
Visual Grounding for Object-Level Generalization in Reinforcement Learning: This paper leverages the visual grounding capability of a vision-language model (MineCLIP) to generate confidence maps of target objects. VLM knowledge is transferred to reinforcement learning through two pathways—reward design and task representation—enabling zero-shot generalization to unseen objects and instructions.