Skip to content

ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization

Conference: ICML2025
arXiv: 2505.10250
Code: GitHub - ADHMR
Area: LLM Alignment
Keywords: Human Mesh Recovery, Diffusion Models, DPO, Preference Optimization, HMR-Scorer

TL;DR

This work introduces the concept of DPO into diffusion-based Human Mesh Recovery (HMR). By training an HMR-Scorer to evaluate prediction quality and constructing a preference dataset (winner/loser pairs), the base diffusion model is fine-tuned via DPO, improving HMR performance on in-the-wild images without requiring 3D annotations.

Background & Motivation

Limitations of Probabilistic HMR

Deterministic HMR produces a single prediction, whereas probabilistic methods (diffusion models) generate multiple candidates but lack alignment: 1. Mismatch between 3D mesh predictions and 2D image cues. 2. Poor performance on in-the-wild images.

Root Causes

Diffusion training objectives focus on distribution matching rather than precise alignment. End-to-end diffusion models typically avoid using reprojection losses during early denoising steps. Furthermore, pseudo-3D annotations are inherently noisy.

Suitability of DPO

Traditional joint-wise or pixel-wise losses are highly prone to overfitting to noisy labels. DPO focuses on relative rather than absolute quality, offering greater robustness.

Method

Overall Architecture

  1. Training the HMR-Scorer: A reward model designed to evaluate the alignment quality between the mesh prediction and the input image.
  2. Constructing the Preference Dataset: Ranking candidates generated by the base model using the HMR-Scorer.
  3. DPO Fine-Tuning: Fine-tuning the base diffusion-based HMR model using the preference dataset.

Key Designs

  • Multi-scale image feature extraction (global and local).
  • Sampling pixel alignment features via projected human keypoints.
  • Training objective: Predict reconstruction quality scores (a comprehensive metric of PVE, MPJPE, and PA-MPJPE).

DPO for Diffusion HMR

  • Adapting Diffusion-DPO from Wallace et al. (2024).
  • Preference pairs: High-score predictions (winner) vs. low-score predictions (loser).
  • Enhanced robustness to noisy labels compared to pointwise losses.

Byproduct: Data Cleaning

The HMR-Scorer can also be utilized to filter out low-quality pseudo-labeled training samples, retaining only high-scoring ones. Experiments demonstrate that performance improves even with a reduced dataset.

Key Experimental Results

Comparison with SOTA Probabilistic HMR

Method MPJPE PA-MPJPE PVE
ProHMR 59.8 41.2 72.4
ScoreHypo 52.3 36.1 63.8
ADHMR 48.7 33.5 59.2

Data Cleaning Performance of HMR-Scorer

Training Data Original Performance Cleaned Performance Change in Data Volume
100% Original Data Baseline - -
HMR-Scorer Top-80% Outperforms Baseline +2.3 MPJPE -20%
HMR-Scorer Top-60% Outperforms Baseline +1.8 MPJPE -40%

Key Findings

  1. DPO improves in-the-wild performance more effectively than traditional 3D losses.
  2. The multi-scale features of the HMR-Scorer are highly sensitive to subtle misalignments.
  3. Data cleaning enables the model to achieve superior performance using less data.
  4. Fine-tuning on in-the-wild images is achievable without requiring 3D annotations.

Highlights & Insights

  1. Successfully transfers the LLM alignment concept of DPO to the CV task of HMR.
  2. The HMR-Scorer serves as an effective byproduct that can replace manual data inspection.
  3. Robustness to pseudo-annotation noise is a key advantage of DPO in computer vision.
  4. Enables highly practical, in-the-wild fine-tuning without requiring any 3D annotations.

Limitations & Future Work

  1. The performance upper bound of the HMR-Scorer is constrained by the training data distribution.
  2. DPO requires generating multiple candidates to construct preference pairs, increasing computational overhead.
  3. The method is only validated in single-person scenarios, and multi-person HMR remains to be explored.
  4. Capability under extreme occlusion or rare poses has not been explicitly evaluated.
  • Difference from ScoreHypo: ScoreHypo uses an auxiliary network to select candidates at test time, whereas ADHMR directly improves generation quality.
  • Relationship with Diffusion-DPO: Adapts its mathematical formulation but customizes it for structured output (mesh instead of image).
  • Insights: The DPO paradigm can be generalized to other structured prediction tasks such as depth estimation and pose estimation.

Rating

  • Novelty: 4.5/5 — Applying DPO to HMR and introducing the HMR-Scorer byproduct
  • Experimental Thoroughness: 4.5/5 — Comprehensive comparisons and data cleaning experiments
  • Writing Quality: 4.0/5
  • Value: 4.5/5 — Significantly advances probabilistic 3D computer vision

Supplementary Technical Analysis

Multi-Scale Feature Design of HMR-Scorer

Global features capture overall pose plausibility, while local features capture fine-grained alignment by sampling via projected keypoints. Both are concatenated and fed into the scoring MLP.

Advantages of DPO over Traditional Losses

Traditional joint-wise losses are prone to overfitting to noisy labels, whereas DPO focuses on relative quality (ensuring the winner outperforms the loser), making it more robust against imperfect pseudo-annotations.

Unexpected Findings in Data Cleaning

After filtering out the bottom 20% lowest-quality pseudo-labeled data using the HMR-Scorer, the model's performance improved by +2.3 MPJPE despite training on only 80% of the data. This indicates that data quality is more critical than quantity.