Skip to content

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

Conference: NeurIPS 2025 arXiv: 2504.14305 Code: Project Page Area: Video Understanding Keywords: Humanoid Robot, Adversarial Learning, Motion Imitation, Whole-Body Control, Sim-to-Real

TL;DR

ALMI proposes an upper-lower body adversarial training framework: the lower-body policy learns robust locomotion under upper-body motion perturbations, while the upper-body policy learns precise motion imitation under lower-body locomotion perturbations. Through iterative adversarial training converging to a Nash equilibrium, the framework enables stable whole-body coordinated control on the Unitree H1-2 real robot.

Background & Motivation

State of the Field

Background: Existing methods employ monolithic RL policies to simultaneously control all joints, using motion tracking error as the reward signal.

Limitations of Prior Work: Monolithic policies neglect the functional distinction between upper and lower bodies; the high DoF count (21) makes training difficult; prioritizing tracking accuracy over balance leads to frequent falls on real hardware.

Key Challenge: Large upper-body motions destabilize balance, while rapid lower-body locomotion degrades upper-body tracking accuracy — a naturally adversarial relationship.

Goal: Enable the upper and lower bodies to independently learn their respective tasks while ensuring whole-body coordination.

Key Insight: Model the upper and lower bodies as two players in a zero-sum game.

Core Idea: Adversarial training encourages the lower body to learn to maintain balance regardless of upper-body motions, and the upper body to achieve precise tracking regardless of lower-body locomotion.

Method

Overall Architecture

Two coupled zero-sum Markov games trained in alternation: (1) when learning \(\pi^l\), the upper body acts as the adversary; (2) when learning \(\pi^u\), the lower body acts as the adversary.

Key Designs

  1. Zero-Sum Game Formulation:

  2. Mechanism: \(\max_{\pi^l} \min_{\pi^u} V_\rho^l\) and \(\max_{\pi^u} \min_{\pi^l} V_\rho^u\)

  3. Theoretical guarantee: Theorem 3.1 proves convergence to an \(\epsilon\)-approximate Nash equilibrium

  4. Command-Space Adversary (Simplified Implementation):

  5. Instead of directly optimizing opponent parameters, adversarial commands are sampled (more extreme motions / higher velocities)

  6. An Arm Curriculum progressively increases adversarial difficulty

  7. PPO Training: 3 rounds of adversarial iteration, 4096 parallel environments, approximately 17 hours

Loss & Training

The model is trained end-to-end with an objective that jointly optimizes task loss and regularization terms.

Key Experimental Results

Main Results (CMU MoCap, 1122 motion clips)

Method \(E_{vel}\) \(E_{jpe}^{upper}\) Survival↑
Exbody (monolithic) 0.238 0.356 89.1%
ALMI (monolithic) 0.139 0.576 99.9%
ALMI 0.114 0.193 100%

Ablation Study

Configuration Survival (Hard)↑
ALMI (full) 97.2%
w/o curriculum 96.1%
w/o adv. (round 1) 93.1%

Key Findings

  • Each round of adversarial iteration improves robustness
  • Successful real-robot deployment
  • ALMI-X dataset: 80K+ whole-body control trajectories with language descriptions

Highlights & Insights

  • Modeling the functional distinction between upper and lower bodies as an adversarial game is the core innovation
  • Command-space adversary is a practically essential simplification
  • A rare combination of theoretical guarantees and real-world deployment

Limitations & Future Work

  • Upper-body control is limited to joint position tracking, constraining expressiveness
  • Real-robot experiments are relatively limited, lacking quantitative evaluation
  • vs. Exbody/Exbody2: Monolithic policies that do not distinguish upper and lower body functions
  • vs. Decoupled control methods: Also decouple the two bodies but lack adversarial training to ensure coordination

Rating

  • Novelty: ⭐⭐⭐⭐ Insightful adversarial game formulation
  • Experimental Thoroughness: ⭐⭐⭐⭐ Simulation + real robot + ablation + dataset
  • Writing Quality: ⭐⭐⭐⭐ Clear methodology presentation
  • Value: ⭐⭐⭐⭐ Practical value for humanoid robot control