Contact-Aware Refinement of Human Pose Pseudo-Ground Truth via Bioimpedance Sensing¶

Conference: ICCV 2025 arXiv: 2512.04862 Code: biotuch.is.tue.mpg.de Area: Human Understanding / 3D Human Pose Estimation Keywords: self-contact detection, bioimpedance sensing, 3D human pose, SMPL-X, multimodal fusion

TL;DR¶

This paper proposes BioTUCH, a framework that detects self-contact events via wrist-to-wrist bioimpedance sensing and performs contact-aware 3D arm pose refinement in conjunction with a visual pose estimator, achieving an average improvement of 11.7% in reconstruction accuracy.

Background & Motivation¶

Limitations of Prior Work¶

Limitations of Prior Work: Background: 3D human pose estimation performs poorly in self-contact scenarios (e.g., hand-to-face, clasped hands), primarily due to two reasons:

Depth ambiguity: Contact along the camera optical axis is difficult to disambiguate in monocular RGB, causing hands to appear to "float" in front of surfaces they should be touching.

Scarcity of training data: Occlusion and depth ambiguity degrade pseudo-ground-truth (pGT) quality in self-contact scenarios, while multi-view capture systems are prohibitively expensive.

Existing methods such as TUCH and SCP attempt to model self-contact but struggle to reliably distinguish "proximity" from "actual touch" using visual signals alone. This paper takes an alternative approach by introducing bioimpedance sensing — a low-cost, non-invasive wearable signal — to directly measure skin-to-skin contact ground truth, forming a complementary vision-plus-sensing solution.

Method¶

Overall Architecture¶

BioTUCH consists of two stages: (1) detecting the onset and offset of self-contact events using bioimpedance signals; and (2) refining arm pose for frames where contact is detected, driving the SMPL-X mesh to produce physically plausible contact configurations.

Key Designs¶

Bioimpedance Sensing and Self-Contact Detection:
- Electrode wristbands are worn on both wrists to form a wrist-to-wrist impedance measurement circuit.
- Skin-to-skin contact creates a parallel circuit path, causing a sharp drop in impedance.
- Signal processing pipeline: resampling → median filtering (100 ms window) → differentiation → adaptive thresholding.
- Contact onset: the differentiated signal falls below a threshold (set at ~1/3 of the average of the three smallest values).
- Contact offset: impedance recovers to 98% of the pre-contact value.
- High specificity (0.992) is prioritized over sensitivity (0.858) to reduce false positives that would interfere with optimization.
- A miniaturized sensor (2 cm × 1.8 cm × 1.1 cm, ~USD 20) is also designed, concealable under clothing.
Contact-Aware Arm Pose Optimization:
- Based on the SMPL-X parametric model \(M(\theta, \beta, \psi)\).
- Only arm joint parameters (shoulder, elbow, wrist) are optimized via masked gradient updates: \(\boldsymbol{\theta}_{i+1} = \boldsymbol{\theta}_i - \eta \nabla_{\boldsymbol{\theta}} \mathcal{L} \odot \mathbf{M}_a\)
- Contact region identification: distances between hand vertices \(\mathbf{v}\) and target upper-body vertices \(\mathbf{u}\) are computed, with lower weight assigned to the z-axis (1 cm error in the x/y plane ≈ 0.25 cm error along z).
- Selection of which arm to optimize: weighted distances for both hands are compared; both arms are optimized simultaneously when the distance difference is ≤ 50%.
Multi-Term Loss Function Design:
- Total loss: \(\mathcal{L} = \mathcal{L}_{2D} + \lambda_{contact} \mathcal{L}_{contact}\)
- \(\mathcal{L}_{2D}\): 2D joint reprojection error for the arms.
- \(\mathcal{L}_{contact} = \mathcal{L}_{consistency} + \mathcal{L}_{interpenetration} + \mathcal{L}_{proximity}\)
- Proximity loss with camera-axis-adaptive weighting: \(\mathcal{L}_{proximity} = \sum_{h \in \mathcal{H}} \sum_{(\mathbf{v}_i, \mathbf{u}_i) \in \mathcal{P}_h} \sum_{d \in \{x,y,z\}} \omega_d |v_i^d - u_i^d|\)
- The depth-direction weight is 4× that of the image plane, reflecting that depth errors are typically 3–4× larger than in-plane errors.
- Contact is considered achieved and optimization halted when all axis-wise distances are ≤ 5 mm.

Loss & Training¶

The pipeline is a post-processing optimization procedure rather than end-to-end training.
Shape parameters \(\beta\) are averaged over the entire sequence to ensure consistency.
A OneEuroFilter is applied in post-processing to improve temporal consistency.
The contact loss is prioritized as the primary driving term; the 2D loss serves as an auxiliary constraint.

Key Experimental Results¶

Main Results¶

Input Method	PA-V2V↓ (mm)	Shoulder (mm)	Elbow (mm)	Wrist (mm)	Detection Rate↑ (%)	Contact Dist.↓ (mm)
Multi-HMR	57.46	23.49	29.86	65.37	41.28	87.48
+BioTUCH	50.21	23.95	30.99	56.29	78.34	71.22
AiOS	72.24	21.84	39.83	77.29	45.87	99.02
+BioTUCH	62.79	22.42	40.65	65.61	78.48	79.16
TUCH	70.55	28.24	41.03	58.71	59.46	96.31
+BioTUCH	63.99	28.27	40.59	52.91	84.60	86.26

Ablation Study¶

Configuration	PA-V2V↓ (mm)	Detection Rate↑ (%)	Notes
Multi-HMR baseline	57.46	41.28	Baseline
+2D Loss only	77.41	42.16	Reprojection-only constraint causes degradation
+Contact Loss only	50.74	79.35	Contact loss contributes most
+BioTUCH (full)	50.21	78.34	Both losses are complementary

Key Findings¶

PA-V2V error improves by an average of 11.7%; contact detection rate increases by an average of 31.6 percentage points.
Wrist joints show the most significant improvement (average reduction of 8.85 mm); slight degradation at the shoulder and elbow is expected since only contact-relevant joints are optimized.
The contact loss is the core contributing component, validating that contact information from bioimpedance cannot be inferred from 2D keypoints alone.
The sensor remains effective in natural in-the-wild scenarios and can be concealed under clothing.

Highlights & Insights¶

Novel cross-modal complementarity: a low-cost wearable sensor addresses the fundamental depth ambiguity inherent to visual systems.
Practical miniaturized sensor design: ~USD 20, 2 cm form factor, 3-hour battery life — suitable for scalable contact data collection.
Arm-only optimization is an elegant design choice — since self-contact is predominantly hand-initiated, local optimization is both efficient and avoids introducing errors into the full-body estimate.
The asymmetric z-axis weighting (4×) directly reflects an intrinsic limitation of monocular visual systems.

Limitations & Future Work¶

Contact region identification depends on the quality of the initial mesh; large initial errors may lead to incorrect region localization.
Finger joint accuracy affects the optimization stopping criterion — e.g., finger interpenetration may cause premature termination.
Contacts involving fully occluded gestures (e.g., hands behind the back) cannot be spatially localized from visual cues alone.
The current approach performs only binary contact detection and does not exploit contact location or area information latent in the impedance signal.
The dataset covers only 3 subjects, though bioimpedance sensing has been validated across diverse populations.

TUCH/SMPLify-XMC: detect contact via geometric thresholding or manual annotation, but struggle to distinguish proximity from actual touch.
Electronic skin sensing: highly accurate but requires whole-body coverage, limiting scalability.
The "sensor + vision" fusion paradigm proposed here generalizes naturally to other human behavior understanding tasks.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — Unique cross-modal fusion concept; first use of bioimpedance for general self-contact detection and pose optimization.
Experimental Thoroughness: ⭐⭐⭐⭐ — Three baseline methods, quantitative + qualitative + in-the-wild evaluation, though dataset scale is limited.
Writing Quality: ⭐⭐⭐⭐⭐ — Problem motivation is clearly articulated, method is described in detail, and figures are intuitive.
Value: ⭐⭐⭐⭐ — The miniaturized sensor design is highly practical and enables scalable collection of high-quality training data.