PD²GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting¶
Conference: ICLR 2026 arXiv: 2506.09663 Code: Available Area: 3D Vision / Articulated Object Modeling Keywords: articulated objects, 3D Gaussian Splatting, part segmentation, continuous deformation, SAM
TL;DR¶
PD²GS is proposed as a framework that learns a shared canonical Gaussian field and models each interaction state as a continuous deformation thereof, enabling part-level decoupling, reconstruction, and continuous control of articulated objects via coarse-to-fine motion trajectory clustering and SAM-guided boundary refinement, without any manual supervision.
Background & Motivation¶
Background: 3D modeling of articulated objects (doors, drawers, laptops) is critical for robotics, AR/VR, and digital twins. Recent works such as PARIS and GAPartNet employ NeRF/3DGS for self-supervised modeling, yet are largely limited to single-joint, two-state settings.
Limitations of Prior Work: (1) Two-state methods support only discrete pairwise comparisons and cannot model continuous motion; (2) they require prior knowledge of the number of parts or strict geometric constraints; (3) multi-part decoupling relies on Marching Cubes explicit meshes, leading to severe error accumulation.
Key Challenge: How can a continuous part-level motion model be learned from limited discrete interaction-state observations?
Goal: Self-supervised learning from multi-view, multi-state images to achieve (1) part-aware reconstruction, (2) part-level continuous control, and (3) accurate kinematic modeling.
Key Insight: The key insight is that each interaction state can be modeled as a continuous deformation of a shared canonical Gaussian field, where motion within a part is consistent while motion across parts differs.
Core Idea: A latent-code-conditioned deformation network drives continuous deformation of the canonical Gaussian field; automatic part decoupling is achieved through motion trajectory clustering followed by SAM-guided boundary refinement.
Method¶
Overall Architecture¶
The input consists of multi-view images of an articulated object captured across \(K\) interaction states. The pipeline comprises: (1) construction of a canonical Gaussian field with a latent-code-conditioned deformation MLP; (2) coarse-grained part clustering based on motion trajectories; (3) SAM-guided boundary refinement; and (4) kinematic analysis and continuous control.
Key Designs¶
-
Deformable Gaussian Splatting:
- Function: Unifies discrete interaction states as continuous deformations of a shared canonical field.
- Mechanism: Each state \(k\) is associated with a latent code \(\alpha_k \in \mathbb{R}^D\); an MLP \(f_{def}\) predicts per-Gaussian displacements \((\Delta\mu_i, \Delta q_i, \Delta s_i) = f_{def}(\mu_i, q_i, s_i | \alpha_k)\), applied via addition (position) and quaternion multiplication (rotation).
- Design Motivation: Latent code parameterization enables interpolation to unseen states after training.
-
Coarse Motion-Driven Part Segmentation:
- Function: Automatically discovers parts from motion trajectories.
- Mechanism: (a) The maximum displacement of each Gaussian across \(K\) states is computed, and a threshold separates static from dynamic regions; (b) a VLM (BLIP/Gemini) estimates the number of moving parts from image pairs via majority voting; (c) motion descriptors (normalized direction + displacement magnitude) are constructed and clustered on the unit sphere via K-means.
- Design Motivation: Gaussians belonging to the same rigid part share consistent motion directions; after normalization, their angular distances remain small even when magnitudes differ.
-
SAM-Guided Boundary Refinement:
- Function: Refines part boundaries.
- Mechanism: 3D-to-2D projected prompt points are generated for Gaussians near part boundaries; SAM is invoked to produce 2D masks, which are back-projected into 3D to correct part labels. Boundary Gaussians are further split into multiple smaller Gaussians with reassigned labels.
- Design Motivation: Motion clustering yields coarse boundaries, while the visual prior from SAM provides pixel-accurate segmentation.
Loss & Training¶
$\(\mathcal{L}_{total} = \mathcal{L}_{photo} + \mathcal{L}_{D_{SIMM}}\)$ comprising a photometric reconstruction loss and a density similarity regularization term.
Key Experimental Results¶
Main Results (PartNet-Mobility)¶
| Method | PSNR↑ | SSIM↑ | Part IoU↑ | Joint Error↓ |
|---|---|---|---|---|
| PARIS | Low | Low | ~50% | High |
| CAGE | Medium | Medium | ~55% | Medium |
| PD²GS | Highest | Highest | ~70% | Lowest |
Ablation Study¶
| Configuration | Reconstruction Quality | Segmentation Accuracy | Notes |
|---|---|---|---|
| Full PD²GS | Best | Best | Complete model |
| w/o SAM refinement | Slightly lower | −~10% | Coarse clustering boundaries imprecise |
| w/o VLM counting | Comparable | −~5% | Manually specified \(K\) yields similar results |
| 2 states vs. 4 states | Lower | Lower | More states provide stronger motion constraints |
Key Findings¶
- Continuous control: Interpolating latent codes generates smooth intermediate states, whereas prior methods can only jump between discrete states.
- Multi-part support: Complex objects with multiple independently moving parts, such as multi-drawer cabinets, are successfully handled.
- RS-Art real data: Strong performance on the authors' real-to-simulation dataset validates sim-to-real generalization.
Highlights & Insights¶
- Elegant latent-code-driven continuous deformation: Discrete state observations are encoded into a continuous motion space, enabling generation of unseen configurations.
- Self-supervised motion-as-segmentation paradigm: Part structure is automatically discovered from motion differences without any manual annotation.
- Novel 3D-to-2D SAM prompting: 2D prompt points are automatically derived from 3D part boundaries, eliminating the need for manually annotated SAM prompts.
Limitations & Future Work¶
- VLM-based part counting can be unreliable; majority voting is required for stability.
- The motion threshold \(\tau_{mot}\) requires manual tuning.
- Only rigid articulated motion is supported; flexible deformations (e.g., cloth) are not addressed.
- The RS-Art dataset remains limited in scale.
Related Work & Insights¶
- vs. PARIS: PARIS supports only single-joint two-state settings, whereas PD²GS enables multi-part, multi-state continuous control.
- vs. dynamic 3DGS (4D-GS, etc.): Dynamic methods do not distinguish part-level motion semantics; PD²GS performs explicit decoupling.
- The framework has direct applicability to robotic object manipulation, as inferred part identities and kinematic parameters can directly inform manipulation planning.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ The combination of canonical field + latent deformation + automatic part discovery is highly original.
- Experimental Thoroughness: ⭐⭐⭐⭐ Covers PartNet-Mobility and RS-Art real data with complete ablations.
- Writing Quality: ⭐⭐⭐⭐ Method description is clear with well-formatted formulations.
- Value: ⭐⭐⭐⭐⭐ A significant advance in continuous modeling of articulated objects.