Spectral Informed Mamba for Robust Point Cloud Processing¶
Conference: CVPR 2025
arXiv: 2503.04953
Code: GitHub
Area: 3D Vision/Point Cloud Processing
Keywords: Point Cloud Analysis, Mamba, Graph Laplacian Spectrum, Self-Supervised Learning, Traversing Strategy
TL;DR¶
A point cloud Mamba traversing strategy, SST, is proposed based on the Graph Laplacian Spectrum. It achieves isometry-invariant classification via Surface-Aware Spectral Traversing (SAST), accurate segmentation via Hierarchical Local Traversing (HLT), and addresses the token placement issue of MAE in Mamba via Traversing-Aware Relocation (TAR).
Background & Motivation¶
State Space Models (SSMs) like Mamba have become efficient alternatives to Transformers due to their linear complexity. However, applying Mamba to point clouds faces three key issues:
- 3D grid traversing is unsuitable for point clouds: Existing methods (Point-Mamba, PCM) simply extend 2D grid traversing of images to 3D. However, adjacent patches in a 3D grid are not necessarily adjacent on the object surface, and the traversing order is viewpoint-dependent.
- Traversing strategies do not match the task: Global traversing suitable for classification is not suitable for tasks requiring local precision, such as segmentation.
- Token placement issues in MAE: Mamba is sensitive to token sequence, thus learnable tokens cannot be placed arbitrarily as in Transformers.
Key Insight: The eigenvectors of the Graph Laplacian provide an isometry-invariant parameterization of the surface manifold, which can define a more robust traversing order.
Method¶
Overall Architecture¶
The Spectral Spatial Traversing (SST) method consists of three components: SAST for spectral traversing in classification tasks, HLT for hierarchical traversing in segmentation tasks, and TAR for token relocation in MAE self-supervised pre-training.
Key Designs¶
Key Design 1: Surface-Aware Spectral Traversing (SAST)¶
- Function: Defines an isometry-invariant patch traversing order for classification tasks.
- Mechanism: A patch connection graph is constructed to compute the first \(s\) non-constant smallest eigenvectors of the random walk Laplacian \(L_{rw} = I - D^{-1}W\). Each eigenvector \(v^{(k)}\) defines a traversing order (once in ascending and once in descending order of eigenvalue), which are concatenated after executing \(s \times 2\) traversals in each Mamba block. Sign ambiguity and repeated eigenvalue ordering ambiguity are resolved via normalization.
- Design Motivation: The Laplacian spectrum is isometry-invariant (rotation and translation do not change the spectrum), and low-frequency eigenvectors encode a smooth parameterization of the object surface manifold, capturing surface adjacency better than 3D grid traversing.
Key Design 2: Hierarchical Local Traversing (HLT)¶
- Function: Defines precise traversing for segmentation tasks, considering all \(s\) eigenvectors simultaneously.
- Mechanism: Inspired by normalized cuts, tokens are recursively bisected based on the mean value of the eigenvector values—first divided into two groups by the first eigenvector, then subdivided by the second eigenvector, and so on. Ultimately, each token receives an \(s\)-bit binary code \(b_i = [b_i^{(1)}, ..., b_i^{(s)}]\) and is traversed in lexicographical order.
- Design Motivation: In SAST, each eigenvector is used independently, making it difficult to distinguish specific parts of an object (e.g., left arm vs. right arm). HLT achieves more accurate spatial localization by combining information from all eigenvectors.
Key Design 3: Traversing-Aware Relocation (TAR)¶
- Function: Solves the token placement issue in Mamba MAE pre-training.
- Mechanism: Before the MAE decoder, learnable tokens are restored to their original masked positions (instead of being appended to the end of the sequence as in Transformers), maintaining the spatial adjacency defined by the spectral traversing.
- Design Motivation: Mamba is a direction-sensitive sequence model where token order directly affects state propagation. Placing learnable tokens at the end disrupts spatial continuity.
Loss & Training¶
Self-supervised pre-training employs the Chamfer distance reconstruction loss: \(\mathcal{L}_{rec} = \frac{1}{N_m} \sum_{i=1}^{N_m} \text{Chamfer}(\mathcal{S}_i, \hat{\mathcal{S}}_i)\).
Key Experimental Results¶
Main Results: ModelNet40 Classification¶
| Method | Params | Overall Accuracy (%) |
|---|---|---|
| Point-MAE | 22.1M | 93.2 |
| Point-M2AE | 12.4M | 93.4 |
| Point-Mamba | - | 93.6 |
| SI-Mamba (Ours) | ~12M | 94.0 |
ScanObjectNN Classification (Hardest setting PB_T50_RS)¶
| Method | Accuracy (%) |
|---|---|
| Point-MAE | 85.2 |
| Point-M2AE | 86.4 |
| SI-Mamba | 88.1 |
ShapeNetPart Segmentation¶
| Method | Category mIoU (%) | Instance mIoU (%) |
|---|---|---|
| Point-MAE | 84.2 | 86.1 |
| SI-Mamba (HLT) | 84.8 | 86.5 |
Ablation Study¶
| Traversing Strategy | ModelNet40 Acc. |
|---|---|
| 3D Grid (Baseline) | 93.6 |
| SAST (2 eigenvectors) | 93.8 |
| SAST (4 eigenvectors) | 94.0 |
| HLT (Segmentation) | Optimal segmentation performance |
Key Findings¶
- SAST shows the most prominent advantage on ScanObjectNN (+1.7%), as this dataset contains rotation and translation transformations.
- HLT significantly outperforms SAST on the segmentation task, validating the necessity of task-specific traversing strategies.
- TAR resolves the token placement issue in Mamba MAE pre-training, bringing self-supervised performance close to Transformers.
Highlights & Insights¶
- Elegant integration of spectral graph theory and SSM: Defining traversing order with Laplacian eigenvectors has a solid theoretical foundation.
- Isometry invariance has practical value: Viewpoint-independent traversing strategies are particularly crucial for real-world point clouds (where sensor positions are not fixed).
- Distinction between traversing strategies for classification vs. segmentation: SAST and HLT are designed specifically for global and local tasks, respectively.
Limitations & Future Work¶
- Laplacian eigendecomposition incurs additional computational overhead, although efficient algorithms are available for sparse matrices.
- The number of eigenvectors \(s\) needs to be manually selected.
- Large-scale outdoor point clouds (e.g., autonomous driving scenarios) have not been explored.
- The handling of repeated eigenvalue ambiguity for eigenvectors might not be sufficiently robust.
Related Work & Insights¶
- Point-Mamba, PCM: Pioneering works applying Mamba to point clouds.
- Point-MAE, Point-M2AE: Transformer-based point cloud MAE.
- The concept of spectral traversing can be extended to other SSM tasks that require serialization of irregular data.
Rating¶
⭐⭐⭐⭐ — The combination of spectral graph theory and Mamba is theoretically elegant, and SAST/HLT/TAR each address a clear problem. Consistent improvements across multiple benchmarks validate the effectiveness of the method.