DcMatch: Unsupervised Multi-Shape Matching with Dual-Level Consistency¶

Conference: AAAI 2026
arXiv: 2509.01204
Code: YeTianwei/DcMatch
Area: LLM Evaluation
Keywords: multi-shape matching, functional maps, cycle consistency, graph attention network, unsupervised learning

TL;DR¶

This paper proposes DcMatch, an unsupervised multi-shape matching framework that employs a shape graph attention network to capture the underlying manifold structure of a shape collection for constructing a more expressive shared universe space, while enforcing dual-level cycle consistency constraints in both the spatial and spectral domains, achieving comprehensive state-of-the-art performance across multiple benchmark datasets.

Background & Motivation¶

Problem Definition¶

Multi-Shape Matching aims to establish dense point-to-point correspondences among a set of 3D shapes. Compared to pairwise matching, it presents two additional challenges:

Cycle consistency: Composing maps along any closed path should yield the identity map — a global constraint absent in pairwise matching.

Combinatorial explosion: The number of shape pairs grows combinatorially with the size of the collection, incurring substantial computational cost.

Limitations of Prior Work¶

Paradigm 1: Permutation Synchronization - Computes pairwise correspondences first, then enforces cycle consistency via post-processing. - Two-stage optimization often produces spatially non-smooth and noisy results.

Paradigm 2: Universe-Based Methods - Introduces a virtual universe shape, reducing multi-shape matching to mapping each shape to the universe. - Existing methods (e.g., UDMSM) learn universe embeddings from individual shapes, ignoring the structural relationships within the shape collection. - This effectively degenerates multi-shape matching into a set of independent pairwise problems.

Core Motivation¶

The universe space should be learned from the manifold structure of the entire shape collection, rather than from individual reference shapes.
Spectral-domain cycle consistency of functional maps and spatial-domain global consistency of universe matching should be aligned within a shared universe space.

Method¶

Overall Architecture¶

The DcMatch pipeline consists of four modules: 1. Feature extractor (DiffusionNet) → per-vertex features $\mathcal{F}$ 2. Functional map module → bidirectional functional maps $C_{ij}, C_{ji}$ and pointwise correspondences $\Pi_{ij}$ 3. Shape graph attention module → manifold-aware features $\mathcal{G}$ 4. Universe predictor → shape-to-universe correspondences $\Pi_i$

Key Designs¶

1. Hybrid Functional Maps: Combining LBO and Elastic Bases¶

Two sets of basis functions are computed per shape: - LBO eigenfunctions $\Phi_i \in \mathbb{R}^{n_i \times k_{LB}}$ ($k_{LB}=160$) - Elastic thin-shell energy eigenfunctions $\Psi_i \in \mathbb{R}^{n_i \times k_{Elas}}$ ($k_{Elas}=40$)

The hybrid basis is $\widetilde{\Phi}_i = [\Phi_i; \Psi_i]$, with functional maps solved separately on each basis: - The diagonal block $C_{ij}^{11}$ on the LBO basis: minimizing $E^{LB}_{data}(C) + \lambda_{LB} E^{LB}_{reg}(C)$ - $C_{ij}^{22}$ on the elastic basis: using Hilbert-Schmidt norm regularization

Design Motivation: LBO bases excel at capturing isometric deformations, while elastic bases handle non-isometric deformations; combining both improves robustness.

2. Shape Graph Attention Module: Capturing Collection Manifold Structure (Core Contribution)¶

Graph construction: Edges are defined based on Top-$k$ cosine similarity of shape features: $$\mathcal{E} = \{(i,j) \mid j \in \text{Top-}k(\cos(\mathcal{F}_i, \mathcal{F}_j))\}$$

Graph attention aggregation: GAT dynamically learns inter-shape attention weights: $$\alpha_{ij} = \frac{\exp(\mathbf{a}^\top \text{LeakyReLU}(\mathbf{W} \cdot [\mathcal{F}_i \| \mathcal{F}_j]))}{\sum_{j' \in \mathcal{N}_i} \exp(\mathbf{a}^\top \text{LeakyReLU}(\mathbf{W} \cdot [\mathcal{F}_i \| \mathcal{F}_{j'}]))}$$

Neighbor features are aggregated to yield manifold-aware features: $$\mathcal{F}_i' = \sigma\left(\sum_{j \in \mathcal{N}_i} \alpha_{ij} \cdot \mathbf{W}\mathcal{F}_j\right)$$

The final feature concatenates original and aggregated features: $\mathcal{G}_i = [\mathcal{F}_i' \| \mathcal{F}_i]$.

Two-layer GAT with LayerNorm and Dropout is used. Design Motivation: Unlike UDMSM, which learns universe embeddings from individual shapes, message passing enables each shape's representation to incorporate contextual information from its neighbors, so the universe space is constructed with awareness of the collection's manifold structure.

3. Universe Predictor: Predicting Shape-to-Universe Correspondences from Manifold-Aware Features¶

Given shape-level features $\mathcal{G}_i$, a DiffusionNet-based architecture generates assignment matrices $\Pi_i \in \{0,1\}^{n_i \times c}$ ($c$ = number of universe points), relaxed to doubly stochastic matrices via Sinkhorn normalization for end-to-end training.

At inference, pairwise correspondences are computed directly by composition: $\Pi_{ij} = \Pi_i \Pi_j^\top$, which inherently guarantees cycle consistency.

4. Dual-Level Cycle Consistency Loss: Aligning Spectral and Spatial Domains (Core Contribution)¶

Key insight: Shape-to-universe alignment can be achieved via two paths: - Spectral path: Functional map coefficient matrix $\mathcal{A}_i$ maps spectral bases to the shared universe → aligned embedding $\Phi_i \mathcal{A}_i$ - Spatial path: Universe correspondence matrix $\Pi_i$ projects directly → aligned embedding $\Pi_i^\top \Phi_i$

Cycle consistency loss enforces agreement between the two paths in universe space: - For near-isometric shapes (Frobenius norm): $$\mathcal{L}_{cycle} = \sum_{i,j}^n \|\Pi_i^\top \widetilde{\Phi}_i \mathcal{A}_i - \Pi_j^\top \widetilde{\Phi}_j \mathcal{A}_j\|_F^2$$ - For non-isometric shapes (cosine similarity): $$\mathcal{L}_{cycle} = \sum_{i,j}^n (1 - \cos(\Pi_i^\top \widetilde{\Phi}_i \mathcal{A}_i, \Pi_j^\top \widetilde{\Phi}_j \mathcal{A}_j))$$

Theoretical support (Theorem 1): If the total functional map energy across all shape pairs is zero, the composed functional map along any closed path acts as the identity on the subspace spanned by $\mathcal{A}_i$.

Loss & Training¶

Total loss: $\mathcal{L}_{total} = \mathcal{L}_{spectral} + \lambda_{cycle} \mathcal{L}_{cycle}$

The spectral loss comprises: - Bijectivity loss: $\mathcal{L}_{bij} = \sum \|C_{ij}C_{ji} - \mathbf{I}\|_F^2 + \|C_{ji}C_{ij} - \mathbf{I}\|_F^2$ - Orthogonality loss: $\mathcal{L}_{orth} = \sum \|C_{ij}^*C_{ji} - \mathbf{I}\|_F^2$ - Coupling loss: $\mathcal{L}_{couple}$ ensures consistency between functional maps and pointwise maps

Training: Adam optimizer, learning rate 0.001, $\lambda_{bij}=\lambda_{orth}=\lambda_{couple}=\lambda_{cycle}=1.0$; DiffusionNet extracts 256-dimensional features.

Key Experimental Results¶

Main Results: Near-Isometric Datasets (Geodesic Error ×100 ↓)¶

Method	Type	FAUST	FAUST_a	SCAPE	SCAPE_a	SHREC'19
HybridFMaps	Pairwise	1.5	1.8	1.8	1.9	4.5
ULRSSM	Pairwise	1.6	1.9	1.9	1.9	4.8
UDMSM	Multi	1.5	15.3	2.0	4.9	17.8
G-MSM	Multi	1.5	12.7	1.8	28.1	6.8
Ours	Multi	1.4	1.7	1.8	1.8	4.2

Key observation: The proposed method exhibits particularly strong advantages on anisotropic meshes (FAUST_a/SCAPE_a) — while UDMSM and G-MSM suffer several-fold error increases, the proposed method remains nearly unaffected.

Non-Isometric Datasets¶

Method	SMAL	DT4D-H intra	DT4D-H inter
HybridFMaps	3.4	1.0	3.9
ULRSSM	3.9	0.9	4.1
UDMSM	26.5	2.4	15.8
G-MSM	43.9	7.8	12.0
Ours	2.9	1.0	3.8

Ablation Study (SMAL Dataset)¶

Configuration	Geodesic Error ×100
w/o Shape Graph Attention Module	3.7 (↑0.8)
w/o Functional Map Module	26.5 (↑23.6)
w/o Universe Predictor	3.4 (↑0.5)
w/o Cycle Consistency Loss	3.8 (↑0.9)
Full Model	2.9

Key Findings¶

Cross-dataset generalization (4.2 on SHREC'19 vs. 6.8 for G-MSM and 17.8 for UDMSM): attributed to manifold-aware universe construction.
Robustness to anisotropic mesh resampling: 1.7 on FAUST_a vs. 15.3 for UDMSM and 12.7 for G-MSM.
Large margin on non-isometric inter-class matching (DT4D-H inter): 2.9 vs. 12.0 for G-MSM.
The functional map module is the most critical component (8× error increase upon removal); the cycle consistency loss and graph attention module are also indispensable.

Highlights & Insights¶

Modeling the shape collection as a graph is more effective than treating shapes in isolation: Message passing aggregates information from neighboring shapes, so the universe embedding is no longer limited to the geometry of a single reference shape.
The dual-level consistency loss design is elegant: It leverages the intrinsic cycle consistency of functional maps to regularize the learning of universe correspondence matrices, with the two alignment paths mutually reinforcing each other.
Adaptive loss selection strategy — Frobenius for near-isometric and cosine for non-isometric deformations — reflects careful adaptation to different deformation types.
The inference-time correspondence computation $\Pi_{ij} = \Pi_i\Pi_j^\top$ inherently guarantees cycle consistency without any post-processing.

Limitations & Future Work¶

Universe size must be fixed in advance: set to the number of vertices of a reference shape or the maximum vertex count in the dataset, which may be suboptimal.
Processing the entire shape collection as a graph increases computational overhead: memory consumption of 8.4 GB vs. 2.6–3.5 GB for baselines.
Partial matching scenarios are not considered (e.g., incomplete scans).
The choice of $k$ in the Top-$k$ graph ($k=3$) is empirical, lacking theoretical justification.

Learning-based functional map methods (DiffusionNet → FMNet → ULRSSM → HybridFMaps) represent the dominant technical paradigm in 3D shape matching.
G-MSM attempts to model the shape collection manifold via heuristics; this work provides a more principled solution using GAT.
Takeaway: For matching and registration tasks requiring global consistency, the combination of Sinkhorn normalization, graph neural networks, and spectral methods constitutes a powerful toolkit.

Rating¶

Novelty: ⭐⭐⭐⭐ (Dual-level consistency and graph-attention-based universe construction are novel, though each individual component has precedent)
Experimental Thoroughness: ⭐⭐⭐⭐⭐ (6 datasets + detailed ablations + robustness analysis + qualitative visualizations)
Writing Quality: ⭐⭐⭐⭐ (Clear structure, intuitive figures)
Value: ⭐⭐⭐⭐ (Achieves systematic improvements in multi-shape matching; code is publicly available)