Fall Risk and Gait Analysis using World-Spaced 3D Human Mesh Recovery¶
Conference: CVPR 2026
arXiv: 2604.11961
Code: None
Area: 3D Vision
Keywords: gait analysis, fall risk, human mesh recovery, older adults, monocular video
TL;DR¶
This paper proposes a gait analysis pipeline based on GVHMR (world-grounded 3D human mesh recovery) that extracts spatiotemporal gait parameters from monocular video of older adults performing the Timed Up and Go (TUG) test, validating the correlation between video-derived metrics and wearable sensor measurements as well as their association with fall risk.
Background & Motivation¶
Background: Gait assessment is a critical clinical indicator of fall risk and overall health in older adults; however, standard clinical practice is largely limited to stopwatch-measured gait speed.
Limitations of Prior Work: Comprehensive gait evaluation is constrained by limited access to technology and specialized training. Inertial sensors, optical marker systems, and multi-camera markerless motion capture require dedicated infrastructure, restricting their deployment beyond controlled clinical or research environments.
Key Challenge: Although biomechanical correlates of fall risk are well established, existing measurement approaches cannot be scaled for deployment in uncontrolled community settings. Existing 2D keypoint methods cannot recover depth information or decouple camera viewpoint from body pose.
Goal: To leverage world-grounded HMR for extracting spatiotemporal gait parameters in absolute metric units from monocular camera video, enabling accessible gait analysis in community settings.
Key Insight: GVHMR reconstructs participants' true trajectories in a gravity-aligned world coordinate system, enabling the extraction of gait parameters in absolute metric units.
Core Idea: Replace 2D skeleton-based methods with GVHMR to achieve end-to-end extraction of world-space gait parameters from monocular video.
Method¶
Overall Architecture¶
The pipeline consists of: (1) recording the TUG test with a GoPro camera; (2) applying GVHMR to recover world-coordinate 3D human trajectories and SMPL-X parameters from video; (3) using signal processing and peak detection to automatically segment TUG sub-tasks; (4) extracting spatiotemporal gait parameters; and (5) performing statistical analysis (correlation analysis and linear mixed-effects modeling).
Key Designs¶
-
GVHMR World-Coordinate Trajectory Extraction:
- Function: Recovers absolute metric-unit 3D human motion trajectories from monocular video.
- Mechanism: GVHMR predicts local body pose, shape parameters, and orientation and translation in a gravity-aligned world coordinate system. World-space 3D joint positions \(\{J^t \in \mathbb{R}^{24 \times 3}\}_{t=0}^{T}\) are regressed from the SMPL-X kinematic model.
- Design Motivation: Camera-relative methods conflate body and camera motion, making it impossible to extract spatial parameters such as stride length.
-
Automatic TUG Sub-task Segmentation:
- Function: Automatically identifies TUG sub-tasks including sit-to-stand transitions, walking, and turning.
- Mechanism: A composite signal \(\text{STS} = 1.0 \cdot \dot{y}_{hip} + 0.7 \cdot \dot{z}_{shoulder} + 0.5 \cdot \dot{\theta}_{trunk}\) is designed to detect sit-to-stand transitions, while turning events are detected via velocity extrema of the hip-line signal \(x_{R,hip} - x_{L,hip}\).
- Design Motivation: The duration of each TUG sub-task has distinct clinical associations with fall risk.
-
Statistical Validation Framework:
- Function: Validates the validity and clinical relevance of video-derived metrics.
- Mechanism: Spearman correlation analysis compares step timing between video and insole sensor measurements; linear mixed-effects (LME) models assess the predictive capacity of fall risk factors (STEADI score, fear of falling) on gait parameters, with random effects controlling for within-participant variability.
- Design Motivation: Each participant completed three TUG trials, making observations non-independent and necessitating LME modeling.
Loss & Training¶
This is an application paper that employs a pretrained GVHMR model; no model training is involved. Gaussian smoothing (\(\sigma=3\), 19-point symmetric filter) is applied for denoising.
Key Experimental Results¶
Main Results¶
| Metric | Fixed Effect | Estimate (95% CI) | p-value |
|---|---|---|---|
| STS Duration | STEADI Score | 1.23 (0.45, 2.01) | 0.002 |
| Stride Length | STEADI Score | -1.36 (-2.03, -0.68) | <0.001 |
| Stride Length Variability | STEADI Score | -19.62 (-30.44, -8.80) | <0.001 |
| Stride Length | FES-I Score | -1.04 (-1.65, -0.43) | 0.001 |
Ablation Study¶
| Validation Analysis | Result | Note |
|---|---|---|
| Step Timing Correlation | ρ=0.673, p<0.001 | Moderate agreement between video and insole sensor |
| Stride Length ICC | 0.81 | High inter-participant consistency |
| Stride Length Model R² | 0.85 | Strong model fit |
| STS Model R² | Low | High within-participant variability |
Key Findings¶
- STEADI score significantly predicts sit-to-stand duration and stride length parameters, but not turning duration.
- Stride length and its variability are more stable indicators with stronger associations with fall risk than sit-to-stand duration (ICC=0.81 vs. low ICC).
- Video-derived step timing systematically underestimates insole sensor measurements, though the trends remain consistent.
Highlights & Insights¶
- Applying GVHMR to clinical gait analysis is a practically valuable contribution: deployment in community centers requires only a GoPro camera and a chair.
- Stride length variability as a proxy indicator for fall risk carries strong clinical significance, consistent with the existing literature.
Limitations & Future Work¶
- Systematic underestimation of step timing from video may be related to differences in sampling rate (30 fps vs. 60 fps).
- Turning segmentation accuracy is affected by individual variation in turning strategies.
- The sample size is limited (52 participants), all of whom are older adults.
- Future work could evaluate the efficacy of GVHMR-derived metrics for prospective fall prediction.
Related Work & Insights¶
- vs. 2D Skeleton Methods: 2D methods cannot recover depth or decouple camera motion; GVHMR reconstructs absolute trajectories in a world coordinate system.
- vs. Multi-camera Systems: This paper requires only a monocular camera, substantially lowering the barrier to deployment.
Rating¶
- Novelty: ⭐⭐⭐ GVHMR is an existing method; the contribution is primarily applicative.
- Experimental Thoroughness: ⭐⭐⭐⭐ Includes sensor-based validation and statistical modeling.
- Writing Quality: ⭐⭐⭐⭐ Method and statistical descriptions are clear.
- Value: ⭐⭐⭐⭐ Has practical applicability for community-based health assessment.