Learning Topology-Driven Multi-Subspace Fusion for Grassmannian Deep Networks¶
Conference: AAAI 2026 arXiv: 2511.08628 Code: GitHub Area: Video Understanding / Manifold Learning Keywords: Grassmannian, manifold learning, subspace fusion, 3D action recognition, Riemannian neural networks
TL;DR¶
This paper proposes GMSF-Net, a topology-driven multi-subspace fusion network on the Grassmann manifold. By introducing adaptive multi-subspace construction and a Fréchet mean-based subspace interaction mechanism, it successfully transfers the multi-channel interaction paradigm from Euclidean space to non-Euclidean geometry, achieving state-of-the-art performance on 3D action recognition, EEG classification, and graph tasks.
Background & Motivation¶
Core Problem¶
The Grassmann manifold is a powerful tool for geometric representation learning, enabling high-dimensional data to be modeled as low-dimensional subspaces. However, existing methods suffer from two critical limitations:
Static single-subspace representation: Methods such as GrNet and GDLNet model input data using a single fixed orthogonal subspace, making it difficult to capture local geometric variations and multimodal distributions, thereby limiting representational capacity.
Lack of subspace interaction: Much of the success of deep learning in Euclidean space is attributable to multi-channel interactions and nonlinear activations (e.g., multi-channel convolutions in LeNet-5), yet multi-subspace interaction on the Grassmann manifold has been largely overlooked.
Key Challenges¶
- How to perform effective subspace interaction on the Grassmann manifold?
- How to design stackable deep architectures to expand model capacity?
- How to guarantee optimization convergence on Riemannian manifolds?
Method¶
Overall Architecture: GMSF-Net¶
GMSF-Net consists of three core modules: the Adaptive Multi-Subspace Encoder (AMSE), the Multi-Subspace Interaction Block, and Riemannian Batch Normalization.
1. Adaptive Multi-Subspace Construction (AdaMSC)¶
This is the core of the AMSE, inspired by the Kolmogorov–Arnold representation theorem. The procedure is as follows:
- Extract frame-level features and compute the covariance matrix \(X \in \mathbb{R}^{n \times n}\), modeling statistical dependencies of features along the temporal dimension.
- Schmidt orthogonalization: Map \(X\) to a set of low-dimensional orthogonal subspaces \(\mathcal{S} = \{S_1, S_2, \ldots, S_k\}\), where \(S_i^\top S_j = 0\) (\(i \neq j\)) and each \(S_j \in \mathcal{G}(n,1)\).
- Learnable weight selection: Initialize learnable weights \(\mathcal{W}^{(m')}\) for each new subspace \(S'_{m'}\), apply Softmax normalization, and select the top-\(p\) most important atomic subspaces.
- Weighted combination: \(S'_{m'} = [\tilde{w}_{j_1}^{(m')} S_{j_1}, \tilde{w}_{j_2}^{(m')} S_{j_2}, \ldots, \tilde{w}_{j_p}^{(m')} S_{j_p}]\)
This design allows each new subspace to be composed of distinct key atoms, enabling adaptive adjustment to different tasks.
2. Topology-Driven Convergence Analysis¶
The paper rigorously proves the convergence of multi-subspace construction from a topological perspective:
- The projection metric \(d_p(X_1, X_2) = 2^{-1/2} \|X_1 X_1^T - X_2 X_2^T\|_F\) is used to define distances on the Grassmann manifold.
- Subspaces are iteratively updated via Riemannian gradient descent.
- It is proved that under the topology induced by the projection metric, the subspace sequence converges to a stable subspace \(S^*\): \(d(S'(t), S^*) \to 0\).
3. Multi-Subspace Interaction Block¶
This block comprises two sub-components:
Grassmann Multi-Subspace Representation (GMSR): A shared learnable mapping matrix \(W_c\) (constrained to the Stiefel manifold) is applied to all input subspaces, generating representations under different geometric frames: \(X_{GMSR}^{c,m'} = W_c^T S'_{m'}\). QR or SVD decomposition is subsequently applied to maintain orthogonality.
Grassmann Subspace Interaction (GSI): Multiple subspaces are fused using the Fréchet mean:
- When \(m'=2\), a closed-form solution exists (geodesic interpolation).
- When \(m'>2\), the Karcher flow algorithm is used to iteratively optimize in the tangent space.
4. Optimization Strategy¶
- Riemannian Batch Normalization: Maps statistical features to the SPD manifold and normalizes them to enhance discriminability.
- Mutual Information Regularization: Maximizes information complementarity across different subspaces.
- Total Loss: \(\mathcal{L}_{total} = \mathcal{L}_{CE} + \lambda \cdot \mathcal{L}_C\)
Key Experimental Results¶
Main Results I: 3D Action Recognition (FPHA Dataset)¶
| Method | Accuracy (%) | Model Size (MB) | FLOPs (M) |
|---|---|---|---|
| GrNet | 78.79±1.82 | 6.73 | 38.60 |
| SPDNet | 87.65±1.02 | 13.60 | 1595.50 |
| MATT | 87.70±0.68 | 1.83 | 142.07 |
| SPDNetBN | 89.33±0.49 | 13.63 | 1902.97 |
| GDLNet | 87.60±0.69 | 1.83 | 33.69 |
| GMSF-Net-1Block | 90.43±0.74 | 1.20 | 48.42 |
| GMSF-Net-3Blocks | 91.22±0.53 | 1.30 | 81.07 |
GMSF-Net-3Blocks outperforms GrNet by 12.43% while using a smaller model (1.30 MB vs. 6.73 MB).
Main Results II: EEG Signal Classification (MAMEM-SSVEP-II Dataset)¶
| Method | Accuracy (%) | Model Size (MB) |
|---|---|---|
| EEGNet | 53.72±7.23 | 0.075 |
| SCCNet | 62.11±7.70 | 0.55 |
| SPDNet | 62.30±3.12 | 2.81 |
| GrNet | 61.23±3.56 | 1.95 |
| MATT | 65.19±3.14 | 1.97 |
| GDLNet | 65.52±2.86 | 1.95 |
| GMSF-Net-3Blocks | 66.87±1.46 | 1.94 |
GMSF-Net surpasses GrNet by 5.64% and GDLNet by 1.35%, with a notably smaller standard deviation (1.46 vs. 2.86), indicating significantly improved stability.
Ablation Study¶
| Configuration | HDM05 | FPHA | SSVEP |
|---|---|---|---|
| Adaptive subspace + interaction | 63.64% | 90.43% | 66.74% |
| Adaptive subspace (no interaction) | 56.49% | 80.68% | 59.83% |
| Random subspace + interaction | 50.29% | 72.47% | 56.01% |
| Fixed subspace + interaction | 53.04% | 83.06% | 66.05% |
The ablation results clearly demonstrate: (1) the adaptive mechanism substantially outperforms random and fixed subspace alternatives; (2) the interaction mechanism is effective only on high-quality subspaces—applying it to random subspaces instead introduces noise.
Highlights & Insights¶
- First introduction of deep Grassmann subspace interaction in Riemannian neural networks, successfully transferring the multi-channel interaction philosophy from Euclidean to non-Euclidean geometry.
- Topology-driven theoretical guarantees: The convergence of adaptive subspace construction is rigorously proved under the projection metric topology, providing a solid theoretical foundation.
- Significant reduction in model overhead: A 1.30 MB model surpasses SPDNetBN (13.63 MB) on FPHA, demonstrating a clear efficiency advantage.
- Learnable subspace selection mechanism (analogous to KAN) makes subspace construction fully data-driven.
Limitations & Future Work¶
- Task scope: Validation is primarily conducted on small-to-medium-scale datasets (HDM05, FPHA, SSVEP); performance on large-scale video understanding tasks remains untested.
- Grassmannian assumption dependency: Performance gains are limited on datasets that do not conform to an ideal subspace structure (e.g., PubMed), indicating strong geometric assumptions about the data.
- Computational complexity: The cost of computing the Fréchet mean via Karcher flow iteration grows with the number of subspaces and their dimensionality.
- Block saturation: Performance gains diminish progressively from 1 Block to 3 Blocks, suggesting limited depth-scalability potential of the stackable architecture.
Related Work & Insights¶
- GrNet (Huang et al., 2018): The first deep network on the Grassmann manifold, introducing layers such as FRMap, OrthMap, and ProjMap, but restricted to a single static subspace.
- GDLNet (Wang et al., 2024): Introduces self-attention on the Grassmann manifold to capture inter-subspace dependencies, yet still constrained by single-subspace modeling.
- SPDNet/SPDNetBN: Deep networks based on the SPD manifold; relatively large models with high computational cost.
- MATT (Pan et al., 2022): A manifold attention-based method.
Rating¶
| Dimension | Score |
|---|---|
| Novelty | ⭐⭐⭐⭐ |
| Theoretical Depth | ⭐⭐⭐⭐⭐ |
| Experimental Thoroughness | ⭐⭐⭐⭐ |
| Value | ⭐⭐⭐ |
| Writing Quality | ⭐⭐⭐⭐ |
Overall Rating: ⭐⭐⭐⭐ (4/5)
Theoretically rigorous with clearly articulated contributions, the paper successfully transfers multi-channel interaction to the Grassmann manifold. However, the application scenarios remain primarily academic, and large-scale practical utility has yet to be validated.