Learning Topology-Driven Multi-Subspace Fusion for Grassmannian Deep Networks¶

Conference: AAAI 2026 arXiv: 2511.08628 Code: GitHub Area: Video Understanding / Manifold Learning Keywords: Grassmannian, manifold learning, subspace fusion, 3D action recognition, Riemannian neural networks

TL;DR¶

This paper proposes GMSF-Net, a topology-driven multi-subspace fusion network on the Grassmann manifold. By introducing adaptive multi-subspace construction and a Fréchet mean-based subspace interaction mechanism, it successfully transfers the multi-channel interaction paradigm from Euclidean space to non-Euclidean geometry, achieving state-of-the-art performance on 3D action recognition, EEG classification, and graph tasks.

Background & Motivation¶

Core Problem¶

The Grassmann manifold is a powerful tool for geometric representation learning, enabling high-dimensional data to be modeled as low-dimensional subspaces. However, existing methods suffer from two critical limitations:

Static single-subspace representation: Methods such as GrNet and GDLNet model input data using a single fixed orthogonal subspace, making it difficult to capture local geometric variations and multimodal distributions, thereby limiting representational capacity.

Lack of subspace interaction: Much of the success of deep learning in Euclidean space is attributable to multi-channel interactions and nonlinear activations (e.g., multi-channel convolutions in LeNet-5), yet multi-subspace interaction on the Grassmann manifold has been largely overlooked.

Key Challenges¶

How to perform effective subspace interaction on the Grassmann manifold?
How to design stackable deep architectures to expand model capacity?
How to guarantee optimization convergence on Riemannian manifolds?

Method¶

Overall Architecture: GMSF-Net¶

GMSF-Net consists of three core modules: the Adaptive Multi-Subspace Encoder (AMSE), the Multi-Subspace Interaction Block, and Riemannian Batch Normalization.

1. Adaptive Multi-Subspace Construction (AdaMSC)¶

This is the core of the AMSE, inspired by the Kolmogorov–Arnold representation theorem. The procedure is as follows:

Extract frame-level features and compute the covariance matrix \(X \in \mathbb{R}^{n \times n}\), modeling statistical dependencies of features along the temporal dimension.
Schmidt orthogonalization: Map \(X\) to a set of low-dimensional orthogonal subspaces \(\mathcal{S} = \{S_1, S_2, \ldots, S_k\}\), where \(S_i^\top S_j = 0\) (\(i \neq j\)) and each \(S_j \in \mathcal{G}(n,1)\).
Learnable weight selection: Initialize learnable weights \(\mathcal{W}^{(m')}\) for each new subspace \(S'_{m'}\), apply Softmax normalization, and select the top-\(p\) most important atomic subspaces.
Weighted combination: \(S'_{m'} = [\tilde{w}_{j_1}^{(m')} S_{j_1}, \tilde{w}_{j_2}^{(m')} S_{j_2}, \ldots, \tilde{w}_{j_p}^{(m')} S_{j_p}]\)

This design allows each new subspace to be composed of distinct key atoms, enabling adaptive adjustment to different tasks.

2. Topology-Driven Convergence Analysis¶

The paper rigorously proves the convergence of multi-subspace construction from a topological perspective:

The projection metric \(d_p(X_1, X_2) = 2^{-1/2} \|X_1 X_1^T - X_2 X_2^T\|_F\) is used to define distances on the Grassmann manifold.
Subspaces are iteratively updated via Riemannian gradient descent.
It is proved that under the topology induced by the projection metric, the subspace sequence converges to a stable subspace \(S^*\): \(d(S'(t), S^*) \to 0\).

3. Multi-Subspace Interaction Block¶

This block comprises two sub-components:

Grassmann Multi-Subspace Representation (GMSR): A shared learnable mapping matrix \(W_c\) (constrained to the Stiefel manifold) is applied to all input subspaces, generating representations under different geometric frames: \(X_{GMSR}^{c,m'} = W_c^T S'_{m'}\). QR or SVD decomposition is subsequently applied to maintain orthogonality.

Grassmann Subspace Interaction (GSI): Multiple subspaces are fused using the Fréchet mean:

When \(m'=2\), a closed-form solution exists (geodesic interpolation).
When \(m'>2\), the Karcher flow algorithm is used to iteratively optimize in the tangent space.

4. Optimization Strategy¶

Riemannian Batch Normalization: Maps statistical features to the SPD manifold and normalizes them to enhance discriminability.
Mutual Information Regularization: Maximizes information complementarity across different subspaces.
Total Loss: \(\mathcal{L}_{total} = \mathcal{L}_{CE} + \lambda \cdot \mathcal{L}_C\)

Key Experimental Results¶

Main Results I: 3D Action Recognition (FPHA Dataset)¶

Method	Accuracy (%)	Model Size (MB)	FLOPs (M)
GrNet	78.79±1.82	6.73	38.60
SPDNet	87.65±1.02	13.60	1595.50
MATT	87.70±0.68	1.83	142.07
SPDNetBN	89.33±0.49	13.63	1902.97
GDLNet	87.60±0.69	1.83	33.69
GMSF-Net-1Block	90.43±0.74	1.20	48.42
GMSF-Net-3Blocks	91.22±0.53	1.30	81.07

GMSF-Net-3Blocks outperforms GrNet by 12.43% while using a smaller model (1.30 MB vs. 6.73 MB).

Main Results II: EEG Signal Classification (MAMEM-SSVEP-II Dataset)¶

Method	Accuracy (%)	Model Size (MB)
EEGNet	53.72±7.23	0.075
SCCNet	62.11±7.70	0.55
SPDNet	62.30±3.12	2.81
GrNet	61.23±3.56	1.95
MATT	65.19±3.14	1.97
GDLNet	65.52±2.86	1.95
GMSF-Net-3Blocks	66.87±1.46	1.94

GMSF-Net surpasses GrNet by 5.64% and GDLNet by 1.35%, with a notably smaller standard deviation (1.46 vs. 2.86), indicating significantly improved stability.

Ablation Study¶

Configuration	HDM05	FPHA	SSVEP
Adaptive subspace + interaction	63.64%	90.43%	66.74%
Adaptive subspace (no interaction)	56.49%	80.68%	59.83%
Random subspace + interaction	50.29%	72.47%	56.01%
Fixed subspace + interaction	53.04%	83.06%	66.05%

The ablation results clearly demonstrate: (1) the adaptive mechanism substantially outperforms random and fixed subspace alternatives; (2) the interaction mechanism is effective only on high-quality subspaces—applying it to random subspaces instead introduces noise.

Highlights & Insights¶

First introduction of deep Grassmann subspace interaction in Riemannian neural networks, successfully transferring the multi-channel interaction philosophy from Euclidean to non-Euclidean geometry.
Topology-driven theoretical guarantees: The convergence of adaptive subspace construction is rigorously proved under the projection metric topology, providing a solid theoretical foundation.
Significant reduction in model overhead: A 1.30 MB model surpasses SPDNetBN (13.63 MB) on FPHA, demonstrating a clear efficiency advantage.
Learnable subspace selection mechanism (analogous to KAN) makes subspace construction fully data-driven.

Limitations & Future Work¶

Task scope: Validation is primarily conducted on small-to-medium-scale datasets (HDM05, FPHA, SSVEP); performance on large-scale video understanding tasks remains untested.
Grassmannian assumption dependency: Performance gains are limited on datasets that do not conform to an ideal subspace structure (e.g., PubMed), indicating strong geometric assumptions about the data.
Computational complexity: The cost of computing the Fréchet mean via Karcher flow iteration grows with the number of subspaces and their dimensionality.
Block saturation: Performance gains diminish progressively from 1 Block to 3 Blocks, suggesting limited depth-scalability potential of the stackable architecture.

GrNet (Huang et al., 2018): The first deep network on the Grassmann manifold, introducing layers such as FRMap, OrthMap, and ProjMap, but restricted to a single static subspace.
GDLNet (Wang et al., 2024): Introduces self-attention on the Grassmann manifold to capture inter-subspace dependencies, yet still constrained by single-subspace modeling.
SPDNet/SPDNetBN: Deep networks based on the SPD manifold; relatively large models with high computational cost.
MATT (Pan et al., 2022): A manifold attention-based method.

Rating¶

Dimension	Score
Novelty	⭐⭐⭐⭐
Theoretical Depth	⭐⭐⭐⭐⭐
Experimental Thoroughness	⭐⭐⭐⭐
Value	⭐⭐⭐
Writing Quality	⭐⭐⭐⭐

Overall Rating: ⭐⭐⭐⭐ (4/5)

Theoretically rigorous with clearly articulated contributions, the paper successfully transfers multi-channel interaction to the Grassmann manifold. However, the application scenarios remain primarily academic, and large-scale practical utility has yet to be validated.