Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning¶
Conference: NeurIPS 2025 arXiv: 2504.14305 Code: Project Page Area: Video Understanding Keywords: Humanoid Robot, Adversarial Learning, Motion Imitation, Whole-Body Control, Sim-to-Real
TL;DR¶
ALMI proposes an upper-lower body adversarial training framework: the lower-body policy learns robust locomotion under upper-body motion perturbations, while the upper-body policy learns precise motion imitation under lower-body locomotion perturbations. Through iterative adversarial training converging to a Nash equilibrium, the framework enables stable whole-body coordinated control on the Unitree H1-2 real robot.
Background & Motivation¶
State of the Field¶
Background: Existing methods employ monolithic RL policies to simultaneously control all joints, using motion tracking error as the reward signal.
Limitations of Prior Work: Monolithic policies neglect the functional distinction between upper and lower bodies; the high DoF count (21) makes training difficult; prioritizing tracking accuracy over balance leads to frequent falls on real hardware.
Key Challenge: Large upper-body motions destabilize balance, while rapid lower-body locomotion degrades upper-body tracking accuracy — a naturally adversarial relationship.
Goal: Enable the upper and lower bodies to independently learn their respective tasks while ensuring whole-body coordination.
Key Insight: Model the upper and lower bodies as two players in a zero-sum game.
Core Idea: Adversarial training encourages the lower body to learn to maintain balance regardless of upper-body motions, and the upper body to achieve precise tracking regardless of lower-body locomotion.
Method¶
Overall Architecture¶
Two coupled zero-sum Markov games trained in alternation: (1) when learning \(\pi^l\), the upper body acts as the adversary; (2) when learning \(\pi^u\), the lower body acts as the adversary.
Key Designs¶
-
Zero-Sum Game Formulation:
-
Mechanism: \(\max_{\pi^l} \min_{\pi^u} V_\rho^l\) and \(\max_{\pi^u} \min_{\pi^l} V_\rho^u\)
-
Theoretical guarantee: Theorem 3.1 proves convergence to an \(\epsilon\)-approximate Nash equilibrium
-
Command-Space Adversary (Simplified Implementation):
-
Instead of directly optimizing opponent parameters, adversarial commands are sampled (more extreme motions / higher velocities)
-
An Arm Curriculum progressively increases adversarial difficulty
-
PPO Training: 3 rounds of adversarial iteration, 4096 parallel environments, approximately 17 hours
Loss & Training¶
The model is trained end-to-end with an objective that jointly optimizes task loss and regularization terms.
Key Experimental Results¶
Main Results (CMU MoCap, 1122 motion clips)¶
| Method | \(E_{vel}\)↓ | \(E_{jpe}^{upper}\)↓ | Survival↑ |
|---|---|---|---|
| Exbody (monolithic) | 0.238 | 0.356 | 89.1% |
| ALMI (monolithic) | 0.139 | 0.576 | 99.9% |
| ALMI | 0.114 | 0.193 | 100% |
Ablation Study¶
| Configuration | Survival (Hard)↑ |
|---|---|
| ALMI (full) | 97.2% |
| w/o curriculum | 96.1% |
| w/o adv. (round 1) | 93.1% |
Key Findings¶
- Each round of adversarial iteration improves robustness
- Successful real-robot deployment
- ALMI-X dataset: 80K+ whole-body control trajectories with language descriptions
Highlights & Insights¶
- Modeling the functional distinction between upper and lower bodies as an adversarial game is the core innovation
- Command-space adversary is a practically essential simplification
- A rare combination of theoretical guarantees and real-world deployment
Limitations & Future Work¶
- Upper-body control is limited to joint position tracking, constraining expressiveness
- Real-robot experiments are relatively limited, lacking quantitative evaluation
Related Work & Insights¶
- vs. Exbody/Exbody2: Monolithic policies that do not distinguish upper and lower body functions
- vs. Decoupled control methods: Also decouple the two bodies but lack adversarial training to ensure coordination
Rating¶
- Novelty: ⭐⭐⭐⭐ Insightful adversarial game formulation
- Experimental Thoroughness: ⭐⭐⭐⭐ Simulation + real robot + ablation + dataset
- Writing Quality: ⭐⭐⭐⭐ Clear methodology presentation
- Value: ⭐⭐⭐⭐ Practical value for humanoid robot control