Skip to content

Exo-Plore: Exploring Exoskeleton Control Space through Human-Aligned Simulation

Conference: ICLR2026 arXiv: 2601.22550 Code: Project Page Area: Medical Imaging Keywords: exoskeleton optimization, neuromechanical simulation, deep reinforcement learning, human-in-the-loop, surrogate optimization

TL;DR

This paper proposes the Exo-plore framework, which combines neuromechanical simulation with deep reinforcement learning to optimize hip exoskeleton control parameters without requiring human subject experiments, and generalizes to pathological gait scenarios.

Background & Motivation

Exoskeletons have demonstrated great potential for enhancing human mobility, yet delivering appropriate assistance to individual users remains a significant challenge. The current state-of-the-art approach—Human-in-the-Loop Optimization (HILO)—requires participants to walk for hours while wearing an exoskeleton to iteratively optimize control parameters. This creates a paradox: the populations most in need of exoskeleton assistance (e.g., individuals with mobility impairments) are precisely those least capable of tolerating such intensive optimization protocols.

Furthermore, humans actively adapt to external forces imposed by exoskeletons, modifying gait patterns and muscle coordination strategies, which causes predictions based on "fixed gait" assumptions to be systematically inaccurate. Existing neuromechanical simulation methods either rely on motion capture data tracking to handle large observation/action spaces, or depend on hand-crafted biologically inspired controllers with limited generalizability. No unified framework has been established to simultaneously achieve: (i) fitting observed human adaptation behavior, and (ii) predicting responses under unobserved assistance conditions.

Core Problem

How can simulation accurately reproduce human adaptive responses to exoskeleton assistive forces without conducting real human experiments, enabling efficient optimization of exoskeleton control parameters? In particular, how can this capability generalize to pathological gait scenarios to provide personalized assistance for individuals with mobility impairments?

Method

Overall Architecture

Exo-plore consists of two core components: a Gait Data Generator and an Exoskeleton Optimizer.

1. Exoskeleton Controller

The hip exoskeleton employs delayed-feedback control, with the assistive torque defined as:

\[\tau_{\text{exo}}(t) = \kappa \cdot u(t - \Delta t)\]

where \(u(t) = \sin(\theta_r) - \sin(\theta_l)\) is a control signal based on the difference between left and right hip joint angles, \(\kappa\) is the gain (equivalent stiffness parameter), and \(\Delta t\) is the time delay. The optimization objective is to find the optimal \((\kappa, \Delta t)\) that minimizes the metabolic Cost of Transport (CoT).

2. Gait Data Generator

The human controller comprises three modules:

  • PoseNet: Computes PD target joint positions \(\mathbf{q}_d\), trained via Deep RL
  • PD Controller: Generates joint torques to reduce the error between target and current positions, with exoskeleton assistive forces subtracted
  • Muscle Coordination Network (MCN): Maps target joint torques to muscle activations \(\mathbf{a}\), trained via supervised learning

The total reward function is designed as:

\[r_{\text{total}} = w_{\text{gait}} \cdot r_{\text{gait}} + w_{\text{arm}} \cdot r_{\text{arm}} + w_{\text{energy}} \cdot r_{\text{energy}} + w_{\text{HEI}} \cdot r_{\text{HEI}}\]

where \(r_{\text{gait}}\) encourages following target gait patterns, \(r_{\text{arm}}\) penalizes unnatural arm motion, \(r_{\text{energy}}\) regularizes energy consumption, and \(r_{\text{HEI}}\) models human–exoskeleton interaction.

The MCN training loss incorporates a novel Intra-Muscle Regularizer (IMR), which enforces coherent activation patterns among line muscles belonging to the same anatomical muscle group.

3. Sim-to-Real Matching

Two key designs align simulation results with real human experimental data:

(a) Metabolic Energy Model Calibration: Metabolic energy expenditure is modeled as \(\frac{d}{dt}\text{MEE} = \sum_i m_i^\alpha a_i^\beta\). Algorithm 1 and Algorithm 2 search for optimal parameters \((\alpha, \beta)\) such that the simulated Preferred Walking Speed (PWS) matches real human data, yielding \((\alpha, \beta) = (1.5, 1.0)\).

(b) Human–Exoskeleton Interaction (HEI) Reward: Designed based on the resistance minimization hypothesis, reflecting the behavioral principle that humans are more sensitive to losses than gains (Loss Aversion):

\[r_{\text{HEI}} = 1 + \frac{1}{\kappa} \sum_{k \in \{L,R\}} \min(0, P_k)\]

When the exoskeleton imposes resistive power on the human body (\(P_k < 0\)), \(r_{\text{HEI}}\) drops below 1, driving the policy to actively adjust kinematics to reduce resistance, thereby reproducing the adaptive behavior observed in real human experiments.

4. Exoskeleton Optimizer

An MLP surrogate network replaces Gaussian Processes to fully exploit the abundance of simulation data:

  • Latin Hypercube Sampling (LHS) is used to sample the control parameter space, avoiding aliasing effects of grid sampling
  • The surrogate network loss incorporates Huber Loss (for outlier robustness), gradient penalty (to smooth the CoT landscape), and L1/L2 regularization
  • SLSQP and trust-region gradient optimization are applied to identify the optimal control parameters

Key Experimental Results

Unassisted Gait Validation

  • Joint kinematics (ankle, knee, hip) qualitatively match real human experimental data (Boo et al., 2025)
  • Muscle activation patterns resemble human EMG signals without explicit constraints
  • The walking speed–CoT curve is consistent with trends in Browning et al. (2006), with accurate PWS prediction

Assisted Gait Validation

  • Under control parameters \((\kappa, \Delta t) = (8\text{Nm}, 0.25\text{s})\), the scaling trends of assistive torque/power with walking speed are consistent with Lim et al. (2019b)
  • HEI reward vs. no HEI: at 4 km/h, when delay increases from 0.05s to 0.25s in real human experiments, assistive power increases by 1.88×; the HEI reward condition yields 1.73× (correlation coefficient 0.83), while the no-HEI condition yields only 0.67× (correlation coefficient 0.69)
  • The HEI reward condition produces the maximum metabolic reduction rate closest to real human experiments

Control Parameter Optimization

  • Healthy population: the optimal delay \(\Delta t\) decreases monotonically with increasing walking speed
  • Pathological gait: among 5 pathological gait types (equinus, waddling, crouch, calcaneal, foot drop), 4 exhibit a strong linear relationship between optimal gain \(\kappa\) and pathology severity
  • Foot drop fails to converge stably due to excessive gait variability caused by frequent toe–ground collisions

Highlights & Insights

  • Fills a critical gap: The first work to unify neuromechanical simulation and Deep RL for both fitting and predicting exoskeleton-assisted conditions, enabling genuine optimization without human subject experiments
  • Clever HEI reward design: Draws on loss aversion from behavioral economics to model human adaptive behavior via the resistance minimization hypothesis—simple yet effective
  • Rigorous sim-to-real matching: Validation spans not only kinematics but also assistive torque/power scaling, muscle activation patterns, and ground reaction forces across multiple dimensions
  • Pathological gait generalization: Demonstrates a linear relationship between pathology severity and optimal assistance, with direct clinical implications
  • Practical surrogate network: MLP + LHS + gradient penalty outperforms traditional Bayesian Optimization in data-rich simulation settings and offers better scalability

Limitations & Future Work

  • Lack of real human validation: Control parameters optimized in simulation have not been validated on real human subjects, particularly patient populations
  • Simplified reward model: The HEI reward is based on a single assumption and may not capture the full complexity of human adaptive behavior
  • No personalization: The framework does not model individual-specific motor control characteristics
  • Muscle dynamics approximation: Rigid tendons and simplified muscle models may fail to capture individual differences
  • Foot drop failure: 1 out of 5 pathological gait types cannot be successfully optimized, revealing limitations of the framework under high-variability scenarios
  • Simplified foot model: A box-shaped rigid foot leads to overestimated step frequency at low walking speeds
Method Characteristics Limitations
HILO (Zhang et al., 2017; Slade et al., 2024) Iterative optimization via real human experiments Requires hours of walking; infeasible for mobility-impaired patients; <30 iterations
Luo et al. (2024) Deep RL + exoskeleton, published in Nature Relies on imitation policy, limiting adaptation to unseen conditions; no correlation validation against real human data
Generative GaitNet (Park et al., 2022) Deep RL gait generation Does not account for exoskeleton assistance or pathological gait
Exo-plore (Ours) Unified fitting + prediction framework, HEI reward, surrogate optimization No real human validation; simplified muscle model
  • Loss Aversion in robotics: Introducing behavioral economics concepts to model human–robot interaction rewards represents a cross-disciplinary approach worth adopting in other HRI scenarios (e.g., assistive robots, prosthetic control)
  • Surrogate networks vs. GP: In data-rich simulation settings, MLP surrogate networks with gradient penalty are more efficient and scalable than traditional Bayesian Optimization—a transferable insight for other simulation-based optimization problems
  • Pathological gait linear relationship: If validated in real human experiments, this linearity could greatly simplify clinical exoskeleton parameter configuration, allowing optimal parameters to be rapidly estimated from pathology severity alone

Rating

  • Novelty: 8/10 — First work to apply a sim-to-real-matched neuromechanical simulation framework to exoskeleton control optimization; HEI reward design is novel
  • Experimental Thoroughness: 8/10 — Multi-dimensional validation and ablation studies are thorough, but real human experimental validation is absent
  • Writing Quality: 9/10 — Structure is clear, methods are described in detail, and algorithmic pseudocode is well-formatted
  • Value: 8/10 — Significant contribution to the exoskeleton assistance field; pathological gait generalization holds strong clinical promise