ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization¶

Conference: ICML2025
arXiv: 2505.10250
Code: GitHub - ADHMR
Area: LLM Alignment
Keywords: Human Mesh Recovery, Diffusion Models, DPO, Preference Optimization, HMR-Scorer

TL;DR¶

This work introduces the concept of DPO into diffusion-based Human Mesh Recovery (HMR). By training an HMR-Scorer to evaluate prediction quality and constructing a preference dataset (winner/loser pairs), the base diffusion model is fine-tuned via DPO, improving HMR performance on in-the-wild images without requiring 3D annotations.

Background & Motivation¶

Limitations of Probabilistic HMR¶

Deterministic HMR produces a single prediction, whereas probabilistic methods (diffusion models) generate multiple candidates but lack alignment: 1. Mismatch between 3D mesh predictions and 2D image cues. 2. Poor performance on in-the-wild images.

Root Causes¶

Diffusion training objectives focus on distribution matching rather than precise alignment. End-to-end diffusion models typically avoid using reprojection losses during early denoising steps. Furthermore, pseudo-3D annotations are inherently noisy.

Suitability of DPO¶

Traditional joint-wise or pixel-wise losses are highly prone to overfitting to noisy labels. DPO focuses on relative rather than absolute quality, offering greater robustness.

Method¶

Overall Architecture¶

Training the HMR-Scorer: A reward model designed to evaluate the alignment quality between the mesh prediction and the input image.
Constructing the Preference Dataset: Ranking candidates generated by the base model using the HMR-Scorer.
DPO Fine-Tuning: Fine-tuning the base diffusion-based HMR model using the preference dataset.

Key Designs¶

Multi-scale image feature extraction (global and local).
Sampling pixel alignment features via projected human keypoints.
Training objective: Predict reconstruction quality scores (a comprehensive metric of PVE, MPJPE, and PA-MPJPE).

DPO for Diffusion HMR¶

Adapting Diffusion-DPO from Wallace et al. (2024).
Preference pairs: High-score predictions (winner) vs. low-score predictions (loser).
Enhanced robustness to noisy labels compared to pointwise losses.

Byproduct: Data Cleaning¶

The HMR-Scorer can also be utilized to filter out low-quality pseudo-labeled training samples, retaining only high-scoring ones. Experiments demonstrate that performance improves even with a reduced dataset.

Key Experimental Results¶

Comparison with SOTA Probabilistic HMR¶

Method	MPJPE	PA-MPJPE	PVE
ProHMR	59.8	41.2	72.4
ScoreHypo	52.3	36.1	63.8
ADHMR	48.7	33.5	59.2

Data Cleaning Performance of HMR-Scorer¶

Training Data	Original Performance	Cleaned Performance	Change in Data Volume
100% Original Data	Baseline	-	-
HMR-Scorer Top-80%	Outperforms Baseline	+2.3 MPJPE	-20%
HMR-Scorer Top-60%	Outperforms Baseline	+1.8 MPJPE	-40%

Key Findings¶

DPO improves in-the-wild performance more effectively than traditional 3D losses.
The multi-scale features of the HMR-Scorer are highly sensitive to subtle misalignments.
Data cleaning enables the model to achieve superior performance using less data.
Fine-tuning on in-the-wild images is achievable without requiring 3D annotations.

Highlights & Insights¶

Successfully transfers the LLM alignment concept of DPO to the CV task of HMR.
The HMR-Scorer serves as an effective byproduct that can replace manual data inspection.
Robustness to pseudo-annotation noise is a key advantage of DPO in computer vision.
Enables highly practical, in-the-wild fine-tuning without requiring any 3D annotations.

Limitations & Future Work¶

The performance upper bound of the HMR-Scorer is constrained by the training data distribution.
DPO requires generating multiple candidates to construct preference pairs, increasing computational overhead.
The method is only validated in single-person scenarios, and multi-person HMR remains to be explored.
Capability under extreme occlusion or rare poses has not been explicitly evaluated.

Difference from ScoreHypo: ScoreHypo uses an auxiliary network to select candidates at test time, whereas ADHMR directly improves generation quality.
Relationship with Diffusion-DPO: Adapts its mathematical formulation but customizes it for structured output (mesh instead of image).
Insights: The DPO paradigm can be generalized to other structured prediction tasks such as depth estimation and pose estimation.

Rating¶

Novelty: 4.5/5 — Applying DPO to HMR and introducing the HMR-Scorer byproduct
Experimental Thoroughness: 4.5/5 — Comprehensive comparisons and data cleaning experiments
Writing Quality: 4.0/5
Value: 4.5/5 — Significantly advances probabilistic 3D computer vision

Supplementary Technical Analysis¶

Multi-Scale Feature Design of HMR-Scorer¶

Global features capture overall pose plausibility, while local features capture fine-grained alignment by sampling via projected keypoints. Both are concatenated and fed into the scoring MLP.

Advantages of DPO over Traditional Losses¶

Traditional joint-wise losses are prone to overfitting to noisy labels, whereas DPO focuses on relative quality (ensuring the winner outperforms the loser), making it more robust against imperfect pseudo-annotations.

Unexpected Findings in Data Cleaning¶

After filtering out the bottom 20% lowest-quality pseudo-labeled data using the HMR-Scorer, the model's performance improved by +2.3 MPJPE despite training on only 80% of the data. This indicates that data quality is more critical than quantity.