ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization¶
Conference: ICML2025
arXiv: 2505.10250
Code: GitHub - ADHMR
Area: LLM Alignment
Keywords: Human Mesh Recovery, Diffusion Models, DPO, Preference Optimization, HMR-Scorer
TL;DR¶
This work introduces the concept of DPO into diffusion-based Human Mesh Recovery (HMR). By training an HMR-Scorer to evaluate prediction quality and constructing a preference dataset (winner/loser pairs), the base diffusion model is fine-tuned via DPO, improving HMR performance on in-the-wild images without requiring 3D annotations.
Background & Motivation¶
Limitations of Probabilistic HMR¶
Deterministic HMR produces a single prediction, whereas probabilistic methods (diffusion models) generate multiple candidates but lack alignment: 1. Mismatch between 3D mesh predictions and 2D image cues. 2. Poor performance on in-the-wild images.
Root Causes¶
Diffusion training objectives focus on distribution matching rather than precise alignment. End-to-end diffusion models typically avoid using reprojection losses during early denoising steps. Furthermore, pseudo-3D annotations are inherently noisy.
Suitability of DPO¶
Traditional joint-wise or pixel-wise losses are highly prone to overfitting to noisy labels. DPO focuses on relative rather than absolute quality, offering greater robustness.
Method¶
Overall Architecture¶
- Training the HMR-Scorer: A reward model designed to evaluate the alignment quality between the mesh prediction and the input image.
- Constructing the Preference Dataset: Ranking candidates generated by the base model using the HMR-Scorer.
- DPO Fine-Tuning: Fine-tuning the base diffusion-based HMR model using the preference dataset.
Key Designs¶
- Multi-scale image feature extraction (global and local).
- Sampling pixel alignment features via projected human keypoints.
- Training objective: Predict reconstruction quality scores (a comprehensive metric of PVE, MPJPE, and PA-MPJPE).
DPO for Diffusion HMR¶
- Adapting Diffusion-DPO from Wallace et al. (2024).
- Preference pairs: High-score predictions (winner) vs. low-score predictions (loser).
- Enhanced robustness to noisy labels compared to pointwise losses.
Byproduct: Data Cleaning¶
The HMR-Scorer can also be utilized to filter out low-quality pseudo-labeled training samples, retaining only high-scoring ones. Experiments demonstrate that performance improves even with a reduced dataset.
Key Experimental Results¶
Comparison with SOTA Probabilistic HMR¶
| Method | MPJPE | PA-MPJPE | PVE |
|---|---|---|---|
| ProHMR | 59.8 | 41.2 | 72.4 |
| ScoreHypo | 52.3 | 36.1 | 63.8 |
| ADHMR | 48.7 | 33.5 | 59.2 |
Data Cleaning Performance of HMR-Scorer¶
| Training Data | Original Performance | Cleaned Performance | Change in Data Volume |
|---|---|---|---|
| 100% Original Data | Baseline | - | - |
| HMR-Scorer Top-80% | Outperforms Baseline | +2.3 MPJPE | -20% |
| HMR-Scorer Top-60% | Outperforms Baseline | +1.8 MPJPE | -40% |
Key Findings¶
- DPO improves in-the-wild performance more effectively than traditional 3D losses.
- The multi-scale features of the HMR-Scorer are highly sensitive to subtle misalignments.
- Data cleaning enables the model to achieve superior performance using less data.
- Fine-tuning on in-the-wild images is achievable without requiring 3D annotations.
Highlights & Insights¶
- Successfully transfers the LLM alignment concept of DPO to the CV task of HMR.
- The HMR-Scorer serves as an effective byproduct that can replace manual data inspection.
- Robustness to pseudo-annotation noise is a key advantage of DPO in computer vision.
- Enables highly practical, in-the-wild fine-tuning without requiring any 3D annotations.
Limitations & Future Work¶
- The performance upper bound of the HMR-Scorer is constrained by the training data distribution.
- DPO requires generating multiple candidates to construct preference pairs, increasing computational overhead.
- The method is only validated in single-person scenarios, and multi-person HMR remains to be explored.
- Capability under extreme occlusion or rare poses has not been explicitly evaluated.
Related Work & Insights¶
- Difference from ScoreHypo: ScoreHypo uses an auxiliary network to select candidates at test time, whereas ADHMR directly improves generation quality.
- Relationship with Diffusion-DPO: Adapts its mathematical formulation but customizes it for structured output (mesh instead of image).
- Insights: The DPO paradigm can be generalized to other structured prediction tasks such as depth estimation and pose estimation.
Rating¶
- Novelty: 4.5/5 — Applying DPO to HMR and introducing the HMR-Scorer byproduct
- Experimental Thoroughness: 4.5/5 — Comprehensive comparisons and data cleaning experiments
- Writing Quality: 4.0/5
- Value: 4.5/5 — Significantly advances probabilistic 3D computer vision
Supplementary Technical Analysis¶
Multi-Scale Feature Design of HMR-Scorer¶
Global features capture overall pose plausibility, while local features capture fine-grained alignment by sampling via projected keypoints. Both are concatenated and fed into the scoring MLP.
Advantages of DPO over Traditional Losses¶
Traditional joint-wise losses are prone to overfitting to noisy labels, whereas DPO focuses on relative quality (ensuring the winner outperforms the loser), making it more robust against imperfect pseudo-annotations.
Unexpected Findings in Data Cleaning¶
After filtering out the bottom 20% lowest-quality pseudo-labeled data using the HMR-Scorer, the model's performance improved by +2.3 MPJPE despite training on only 80% of the data. This indicates that data quality is more critical than quantity.