χ: Symmetry Understanding of 3D Shapes via Chirality Disentanglement¶
Conference: ICCV 2025 arXiv: 2508.05505 Code: Project Page Area: 3D Vision Keywords: Chirality features, symmetry, shape matching, left-right disambiguation, 2D foundation model distillation
TL;DR¶
This paper proposes an unsupervised chirality feature extraction pipeline that distills left-right chirality information from 2D foundation model features to augment 3D shape vertex descriptors, effectively resolving left-right ambiguity in shape analysis.
Background & Motivation¶
Root Cause¶
Key Challenge: Background: Symmetry and chirality are two sides of the same coin: symmetry focuses on the similarity between two parts, while chirality focuses on their differences. In shape analysis, many vertex descriptors (e.g., Diff3F) exhibit semantic and geometric robustness but cannot distinguish left-right symmetric parts, leading to:
Left-right ambiguity in shape matching — the left eye may be matched to the right eye
Imprecise part segmentation — symmetric body parts cannot be differentiated
Degraded correspondence quality — especially on models with symmetric structures
Although visual chirality has been studied in 2D image domains, no method exists for extracting chirality-aware vertex descriptors in 3D shape analysis.
Method¶
Overall Architecture¶
- Render textured images \(\{I_j\}_{j=1}^N\) from \(N\) viewpoints for a 3D mesh
- Horizontally flip each image to obtain \(\{\bar{I}_j\}_{j=1}^N\)
- Extract features \(F_{img}\) and \(\bar{F}_{img}\) via frozen SD+DINO respectively
- Project onto the mesh to obtain chirality feature pairs \((\mathcal{F}_v, \bar{\mathcal{F}}_v)\)
- Train a chirality network \(\tilde{g}_\Phi\) to extract chirality features \(\chi, \bar{\chi}\) from the feature pairs
Key Designs¶
Chirality Feature Definition
The first dimension is selected and normalized, ensuring \(\chi_v \in [-1, 1]\).
Loss & Training¶
Dissimilarity Loss — maximizes the difference between original and flipped chirality features: $\(\mathcal{L}_{dis} = -\frac{1}{\sqrt{|V|}}\|\chi - \bar{\chi}\|_2\)$
Invertibility Loss — prevents the encoder from learning degenerate solutions: $\(\mathcal{L}_{inv} = \frac{1}{\sqrt{|V|}}\|[\mathcal{F}^\top\;\bar{\mathcal{F}}^\top]^\top - h(g([\mathcal{F}^\top\;\bar{\mathcal{F}}^\top]^\top))\|_F\)$
Total Variation Loss — enforces spatial smoothness: $\(\mathcal{L}_{var} = \frac{1}{|E|}\sum_{(u,v) \in E} \|\chi_u - \chi_v\|_1 + \|\bar{\chi}_u - \bar{\chi}_v\|_1\)$
Fifty-Fifty Loss — balances the number of vertices in the left and right halves: $\(\mathcal{L}_{fif} = \frac{1}{|V|}(\frac{|\chi^\top\mathbf{1}_{|V|}|}{\|\chi\|_\infty} + \frac{|\bar{\chi}^\top\mathbf{1}_{|V|}|}{\|\bar{\chi}\|_\infty})\)$
Total loss: \(\mathcal{L} = \mathcal{L}_{dis} + \lambda_1\mathcal{L}_{inv} + \lambda_2\mathcal{L}_{var} + \lambda_3\mathcal{L}_{fif}\)
Key Experimental Results¶
Left-Right Discrimination Accuracy¶
Main Results¶
| Train/Test | BeCoS | FAUST | SCAPE | SMAL | TOSCA |
|---|---|---|---|---|---|
| Diff3F | 50.87 | 51.21 | 52.53 | 50.91 | 51.48 |
| DINO+SD | 51.16 | 51.05 | 52.55 | 50.80 | 51.42 |
| Liu et al. | 79.98 | 90.45 | 80.84 | 75.71 | 72.88 |
| χ (Ours) | 91.84 | 94.76 | 95.51 | 96.59 | 94.09 |
Cross-Dataset Generalization¶
Ablation Study¶
| Training Set | BeCoS-h Test | BeCoS-a Test |
|---|---|---|
| BeCoS | 94.09 | 84.19 |
| BeCoS-h | 90.36 | 91.10 |
Key Findings¶
- Raw Diff3F/DINO+SD features are nearly incapable of distinguishing left from right (~50%, close to random chance)
- The proposed method achieves over 90% left-right discrimination accuracy across all datasets
- Strong cross-dataset and cross-category generalization, effective even on partial and anisotropic shapes
- Combining chirality features with Diff3F effectively alleviates left-right ambiguity in shape matching
Highlights & Insights¶
- Clever use of horizontal flipping — flipping images alters chirality information while preserving other semantic content, enabling the construction of chirality feature pairs
- Unsupervised approach — requires no left-right annotations; chirality is learned purely from geometric structure
- Plug-and-play enhancement — compatible with any existing vertex descriptor
- Knowledge distillation from 2D to 3D — effectively exploits chirality information implicitly encoded in 2D foundation models
Limitations & Future Work¶
- Relies on the rendering and texturing pipeline of Diff3F, incurring substantial computational overhead
- Chirality is ill-defined for perfectly symmetric objects (e.g., spheres)
- Careful hyperparameter tuning is required to balance the four loss terms
Related Work & Insights¶
- Visual Chirality: Lin et al. visual chirality, mirror detection
- Shape Descriptors: Diff3F, DINO-V2, Stable Diffusion features
- Shape Matching: functional maps, SE-ORNet, DPC
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ (pioneering work on chirality extraction for 3D shapes)
- Technical Depth: ⭐⭐⭐⭐ (four carefully designed loss functions)
- Experimental Thoroughness: ⭐⭐⭐⭐ (validation across multiple datasets and tasks)
- Value: ⭐⭐⭐⭐ (directly improves shape matching quality)