Skip to content

🎮 Reinforcement Learning

📷 CVPR2025 · 4 paper notes

📌 Same area in other venues: 📷 CVPR2026 (25) · 🔬 ICLR2026 (400) · 💬 ACL2026 (46) · 🧪 ICML2026 (110) · 🤖 AAAI2026 (58) · 🧠 NeurIPS2025 (143)

CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning

This paper proposes the CALF framework, which injects configurable network delay, jitter, and packet loss models into RL training. This reduces policy performance degradation by approximately 3-4 times when deployed on real distributed edge devices, revealing that network conditions represent an important but overlooked dimension in the sim-to-real gap.

Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging

The Visual Forager (VF) model is proposed, which simulates human eye-movement strategies in hybrid visual search tasks through target feature modulation, target value modulation, and a ViT-based Actor-Critic decision-making network. It achieves a normalized score of 72.6% (compared to 87.4% for humans), with a saccade amplitude difference of only 0.01° (4.06° vs. 4.05° for humans), revealing for the first time how target value and prevalence jointly influence human search decisions.

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill

This paper proposes the GROVE framework, which constructs a generalized reward function by leveraging LLMs to generate physical constraints and VLMs to evaluate motion semantics in a complementary manner. By using a lightweight Pose2CLIP mapper to skip rendering and project poses directly into the semantic space, GROVE achieves open-vocabulary physical skill learning, yielding 8.4x faster training speed and a 22.2% improvement in motion naturalness compared to existing methods.

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

SkillMimic is proposed, a purely data-driven framework that learns diverse basketball interaction skills from motion capture data using a unified HOI imitation reward (especially the innovative contact graph reward), and composes these skills using a high-level controller to complete complex long-horizon tasks such as continuous scoring.