🎮 Reinforcement Learning¶

📷 CVPR2025 · 4 paper notes

📌 Same area in other venues: 📷 CVPR2026 (25) · 🔬 ICLR2026 (400) · 💬 ACL2026 (46) · 🧪 ICML2026 (110) · 🤖 AAAI2026 (58) · 🧠 NeurIPS2025 (143)

CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning: This paper proposes the CALF framework, which injects configurable network delay, jitter, and packet loss models into RL training. This reduces policy performance degradation by approximately 3-4 times when deployed on real distributed edge devices, revealing that network conditions represent an important but overlooked dimension in the sim-to-real gap.
Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging: The Visual Forager (VF) model is proposed, which simulates human eye-movement strategies in hybrid visual search tasks through target feature modulation, target value modulation, and a ViT-based Actor-Critic decision-making network. It achieves a normalized score of 72.6% (compared to 87.4% for humans), with a saccade amplitude difference of only 0.01° (4.06° vs. 4.05° for humans), revealing for the first time how target value and prevalence jointly influence human search decisions.
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill: This paper proposes the GROVE framework, which constructs a generalized reward function by leveraging LLMs to generate physical constraints and VLMs to evaluate motion semantics in a complementary manner. By using a lightweight Pose2CLIP mapper to skip rendering and project poses directly into the semantic space, GROVE achieves open-vocabulary physical skill learning, yielding 8.4x faster training speed and a 22.2% improvement in motion naturalness compared to existing methods.
SkillMimic: Learning Basketball Interaction Skills from Demonstrations: SkillMimic is proposed, a purely data-driven framework that learns diverse basketball interaction skills from motion capture data using a unified HOI imitation reward (especially the innovative contact graph reward), and composes these skills using a high-level controller to complete complex long-horizon tasks such as continuous scoring.