ICML2025 Robotics & Embodied AI AI paper notes paper summaries Robotics Reinforcement Learning Agents Model Compression

🤖 Robotics & Embodied AI¶

🧪 ICML2025 · 20 paper notes

📌 Same area in other venues: 📷 CVPR2026 (146) · 🔬 ICLR2026 (162) · 💬 ACL2026 (11) · 🧪 ICML2026 (53) · 🤖 AAAI2026 (30) · 🧠 NeurIPS2025 (75)

🔥 Top topics: Robotics ×5 · Reinforcement Learning ×4 · Agents ×3 · Model Compression ×2

Action-Constrained Imitation Learning: Formulates a new problem of "Action-Constrained Imitation Learning (ACIL)" where a constrained agent learns from an unconstrained expert; proposes DTWIL, which generates alternative constrained trajectories via MPC and DTW distance to eliminate occupancy measure mismatch, outperforming baselines significantly on various robotic tasks.
Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning: Proposes the first algorithm to optimize general static Spectral Risk Measures (SRM) within the distributional RL framework, moving beyond existing methods limited to simple CVaR. By leveraging reward distributions, it achieves closed-form outer optimization and temporal decomposition of auxiliary risk measures, outperforming existing risk-sensitive DRL models across diverse risk settings.
BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly: The BiAssemble framework is proposed to decompose the geometric assembly task into three steps (pick-up -> alignment -> assembly) by learning collaboration-aware point-level affordances. It outperforms existing affordance and imitation learning methods in fractured object reassembly tasks and is validated on a real-world benchmark.
Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling: Models the self-refinement planning process of LLMs as a fixed-point problem (deep equilibrium model) to achieve end-to-end supervised training via implicit differentiation without additional verifiers or RL, and designs nested equilibrium solvers for closed-loop, long-horizon robot planning.
CommVQ: Commutative Vector Quantization for KV Cache Compression: This paper proposes CommVQ, which compresses the KV cache using Additive Vector Quantization (AVQ). By innovatively designing a codebook that commutes with RoPE and training it via the EM algorithm, CommVQ achieves near-lossless accuracy at 2-bit and retains usable accuracy at 1-bit, enabling LLaMA-3.1 8B to support a 128K context length on a single RTX 4090 GPU.
Efficient Robotic Policy Learning via Latent Space Backward Planning: Proposes Latent Space Backward Planning (LBP), which recursively predicts intermediate subgoals starting from the final goal to sequentially approach the current state. This significantly improves planning efficiency while maintaining task alignment, achieving a new state of the art (SOTA) in both LIBERO-LONG simulation and real-robot long-horizon tasks.
Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples: Flow of Reasoning (FoR) is proposed to model multi-step LLM reasoning as a Markov flow on a DAG. By fine-tuning LLMs with the trajectory balance objective of GFlowNets, the model can sample multiple high-quality and diverse reasoning paths with probabilities proportional to rewards, using only a minimal number of training examples (e.g., 15).
FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making: The FOUNDER framework is proposed to align the multimodal task representations of Foundation Models (FMs) to the state space of World Models (WMs) by learning a mapping function. In combination with a temporal distance predictor, it generates reward signals to achieve open-ended multi-task embodied decision-making without environment rewards.
Geometric Contact Flows: Contactomorphisms for Dynamics and Control: Proposes Geometric Contact Flows (GCF), which leverage Riemannian and contact geometry as inductive biases. Using contactomorphisms, GCF maps latent contact Hamiltonian dynamics with desired properties (such as stability and energy conservation) to the target dynamics, while utilizing ensemble uncertainty to drive geodesics for robust generalization and obstacle avoidance.
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning: Proposes Annealed Q-learning (AQ-L), which smoothly transitions from the Bellman optimality operator to the Bellman operator by annealing the parameter \(\tau\) of the expectile loss from close to 1 down to 0.5. In continuous action spaces, this both accelerates early learning and suppresses late-stage overestimation bias. When integrated with TD3/SAC, it significantly outperforms baselines on various locomotion and robotic manipulation tasks.
Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning: This paper proposes the Graph-Assisted Stitching (GAS) framework, which replaces explicit high-level policy learning with graph-search-based subgoal selection. By constructing a graph through clustering in the Temporal Distance Representation (TDR) space and performing shortest-path planning, GAS enables highly efficient cross-trajectory stitching in offline HRL. It achieves a breakthrough, boosting the performance on the most challenging antmaze-giant-stitch task from the Prev. SOTA of 1.0 to 88.3.
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models: Proposes Hi Robot, a hierarchical VLM system: a high-level VLM reasons about complex user instructions/feedback to generate atomic commands, while a low-level VLA (\(\pi_0\)) executes actions. Combined with a synthetic data generation scheme, it achieves open-ended instruction following capabilities far surpassing GPT-4o and flat VLAs across three types of robotic platforms.
Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures: A geometric framework is proposed to unify measurement uncertainty, system constraints, and dynamics learning by leveraging the fiber bundle structures naturally induced by the measurement process. By defining a measurement-aware control barrier function (mCBF) on the fiber bundle and combining it with Neural ODEs for learning continuous-time dynamics, the method achieves a 96.3% success rate and a 99.3% constraint satisfaction rate across three robotic control tasks.
Learning to Stop: Deep Learning for Mean Field Optimal Stopping: First to formalize and computationally solve the Mean Field Optimal Stopping (MFOS) problem in discrete time with a finite state space, proving that MFOS approximates Multi-Agent Optimal Stopping (MAOS) at an \(O(1/N)\) rate, and introducing two deep learning algorithms (Direct Approach DA and Dynamic Programming Principle DPP) evaluated on six scenarios with dimensions up to 300.
Maximum Total Correlation Reinforcement Learning: This paper proposes maximizing trajectory Total Correlation as an inductive bias for RL, which encourages the policy to generate simple, compressible trajectories. This significantly enhances zero-shot robustness against observational noise, action noise, and dynamics changes without sacrificing task performance.
Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism: This paper proposes the Adaptive Intervention Mechanism (AIM), which learns a proxy Q-function to simulate human intervention decisions, allowing the robot to proactively request expert assistance. Compared to the uncertainty-based baseline Thrifty-DAgger, AIM reduces human takeover costs and improves learning efficiency by 40%.
SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models: The SENSEI framework is proposed: it leverages a VLM to perform pairwise comparisons of environmental images based on how "interesting" they are, distill a semantic intrinsic reward, and combine it with novelty rewards driven by ensemble uncertainty to achieve task-free semantic exploration via a world model, significantly accelerating downstream task learning.
Sketch-Plan-Generalize: Learning and Planning with Neuro-Symbolic Programmatic Representations for Inductive Spatial Concepts: Proposes SPG (Sketch-Plan-Generalize), a neuro-symbolic agent framework that decomposes inductive concept learning into a three-stage pipeline: concept signature inference (Sketch), MCTS-based grounded action sequence search (Plan), and LLM-driven programmatic inductive generalization (Generalize). It significantly outperforms pure LLM and pure neural methods in learning composable, generalizable spatial abstract concepts from a few demonstrations.
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization: The STAR framework is proposed to address the codebook collapse problem of VQ-VAEs through Rotation-Augmented Residual Skill Quantization (RaRSQ), while modeling dependencies between skills via a Causal Skill Transformer (CST). It achieves an overall success rate of 93.6% on the LIBERO benchmark, outperforming the previous SOTA method QueST by approximately 12%.
X-Hacking: The Threat of Misguided AutoML: Reveals a new security threat in the Explainable AI (XAI) domain termed "X-hacking": by leveraging the pipeline search capabilities of AutoML, adversaries can find explanatory results within the Rashomon set of models that support predetermined conclusions, with Bayesian optimization running approximately 3 times faster than random search.