RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation¶
Conference: ECCV 2024
arXiv: 2408.06110
Code: Yes
Area: 3D Vision
Keywords: Point Cloud Classification, Rotation Invariance, Self-Attention, Surface Attributes, 3D Segmentation
TL;DR¶
RISurConv is proposed to construct local triangular surfaces and extract highly representative Rotation Invariant Surface Properties (RISP). Combined with attention-augmented convolutions, it achieves the first rotation-invariant point cloud analysis network to surpass non-rotation-invariant methods in accuracy.
Background & Motivation¶
Key Challenge¶
Background: Deep learning on 3D point clouds mostly focuses on translation and point permutation invariance, while rotation invariance is less studied. Existing rotation-invariant methods (such as RIConv, RIConv++) ensure rotation invariance through hand-crafted features, but their performance is far below that of non-rotation-invariant methods (such as PointTransformer v2). The primary reason is that global information is lost during the generation of rotation-invariant features, and the LRF/LRA (Local Reference Frame/Axis) is unstable. Goal: The goal of this paper is to narrow or even eliminate the accuracy gap between rotation-invariant and non-rotation-invariant methods.
Method¶
Overall Architecture¶
- Construct a \(K\)-nearest neighbor (KNN) local point set for each reference point.
- Build two triangular surfaces for each neighbor and extract 14-dimensional Rotation Invariant Surface Properties (RISP).
- Embed RISP using an MLP, and then refine features through two self-attention (SA) layers.
- Utilize five RISurConv layers + a Transformer Encoder + fully connected layers to output classification/segmentation results.
Key Designs¶
Rotation Invariant Surface Properties (RISP): For each neighboring point \(x_i\), two adjacent neighbors \(x_{i-1}\) and \(x_{i+1}\) are selected to construct two triangular surfaces. The extracted 14-dimensional features include: distance \(L_0\), 5 Euclidean space angles (triangular interior angles and dihedral angles), and 8 tangent space angles (angles between normal vectors and edges). RISP mathematically describes the double triangles and their relations thoroughly, ensuring geometric completeness.
RISurConv Operator: Consists of two self-attention modules—SA1 refines features among \(K\) points in the neighborhood, and SA2 refines global features among \(N\) representative points. The two modules work synergistically to enhance feature representation.
Loss & Training¶
Cross-entropy loss is used for classification, and standard segmentation loss is used for segmentation.
Key Experimental Results¶
Main Results¶
ModelNet40 Classification Accuracy (Overall Accuracy %):
| Method | Rotation Invariant | z/z | SO3/SO3 | z/SO3 | Std. |
|---|---|---|---|---|---|
| PointNet++ | ✗ | 89.3 | 85.0 | 28.6 | 33.8 |
| Pt Transformer v2 | ✗ | 94.2 | 88.3 | 51.8 | 23.0 |
| RIConv++ | ✓ | 91.3 | 91.3 | 91.3 | 0.0 |
| RISurConv | ✓ | 96.0 | 96.0 | 96.0 | 0.0 |
ScanObjectNN Real-World Classification (PB_T50_RS):
| Method | z/z | SO3/SO3 | z/SO3 |
|---|---|---|---|
| RIConv++ | 80.3 | 80.3 | 80.3 |
| RISurConv | 93.1 | 93.1 | 93.1 |
ShapeNet Part Segmentation (mIoU %):
| Method | SO3/SO3 | z/SO3 |
|---|---|---|
| RIConv++ (xyz+nor) | 80.5 | 80.5 |
| RISurConv (xyz+nor) | 81.5 | 81.5 |
Ablation Study¶
| Ablation Item | Accuracy |
|---|---|
| Full Model (A) | 96.0 |
| Remove \(L_0\) (B) | 95.5 |
| Tangent Space Angles Only (C) | 90.9 |
| \(L_0 + \phi\) Only (D) | 88.2 |
| Remove SA1+SA2+TE (E) | 92.8 |
| Remove Transformer Encoder (D) | 94.3 |
Key Findings¶
- This represents the first rotation-invariant method to surpass all non-rotation-invariant methods on ModelNet40 (96.0% vs. 94.2% of PT v2).
- Outperforms RIConv++ by 12.8 percentage points on ScanObjectNN.
- Angle features are more critical than distance features, with tangent space and Euclidean space angles serving as complementary properties.
- Self-attention modules facilitate a more uniform feature distribution.
Highlights & Insights¶
- Outperforming Non-Rotation-Invariant Methods for the First Time: It is demonstrated that rotation-invariant features do not have to come at the expense of accuracy.
- Local Triangular Surface Construction: Captures local geometric structures more effectively compared to point-wise operations.
- Completeness of RISP: The 14-dimensional features completely describe the double-triangle structure; adding more features does not further improve performance.
- Self-attention enables weight redistribution, enhancing feature efficacy.
Limitations & Future Work¶
- Relatively large parameter size (14M vs. 0.4M for RIConv++), leading to a certain decrease in inference speed.
- The quality of normal vector estimation affects classification accuracy (w/o normal is 0.4% lower than w/ normal).
- There is still room for improvement regarding fine-grained classification.
Related Work & Insights¶
- RIConv only considers local features, leading to diminished accuracy.
- GCAConv employs LRF, but LRF is inherently unstable.
- Insight: Transitioning from point-wise operations to surface-wise representation is a promising path for enhancing 3D feature representation.
Rating¶
- Novelty: ★★★★★ The local triangular surface and RISP feature designs are ingenious, breaking the performance ceiling of rotation invariance for the first time.
- Practicality: ★★★★☆ Rotation invariance is highly important for robotics and autonomous driving scenarios.
- Experimental Quality: ★★★★★ Comprehensive validation on multiple datasets with detailed ablation studies.