🕸️ Graph Learning¶

📷 CVPR2025 · 7 paper notes

📌 Same area in other venues: 📷 CVPR2026 (8) · 🔬 ICLR2026 (118) · 💬 ACL2026 (24) · 🧪 ICML2026 (35) · 🤖 AAAI2026 (37) · 🧠 NeurIPS2025 (54)

Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models: This paper reinterprets multi-head attention as a graph convolutional filter subspace, and linearly combines pre-trained attention maps by learning an extremely small set of subspace combination coefficients (\(H \times H\) matrices). This breaks the convex hull constraint caused by the softmax function to expand the feature space, improving the performance of various PEFT methods in a plug-and-play manner at near-zero parameter cost.
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition: This paper proposes DVHGNN, a vision backbone network that utilizes multi-scale dilated hypergraphs to capture high-order correlations among image patches. By employing clustering and Dilated Hypergraph Construction (DHGC) to extract multi-scale hyperedges, alongside dynamic hypergraph convolution for adaptive feature exchange, DVHGNN achieves an 83.1% top-1 accuracy on ImageNet-1K with 30.2M parameters, outperforming ViG-S by 1.0% and ViHGNN-S by 0.6%.
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges: Proposed HgVT, which embeds a hierarchical bipartite hypergraph structure into ViTs. By processing primary image patch vertices and virtual vertices separately, constructing dynamic cosine adjacency, and utilizing a three-layer attention mechanism based on a hyperedge communication pool, HgVT captures high-order semantic relations among patches without clustering. On ImageNet-1K, HgVT-Ti achieves 76.2% accuracy with 7.7M parameters (outperforming ViHGNN-Ti by 1.9%) and reaches 73.23% mAP@10 in image retrieval.
Knowledge Bridger: Towards Training-Free Missing Modality Completion: This paper proposes Knowledge Bridger, a training-free framework for missing modality completion. By leveraging Large Multimodal Models (LMMs) to automatically mine multimodal knowledge and construct a knowledge graph, it guides the generation and ranking of missing modalities, surpassing existing methods in both general and medical OOD scenarios.
NN-Former: Rethinking Graph Structure in Neural Architecture Representation: NN-Former proposes a hybrid GNN-Transformer architecture predictor, revealing that existing methods overlook the topological information of "sibling nodes" (nodes sharing parent/child nodes). By introducing Adjacency-Sibling Multihead Attention (ASMA) and Bidirectional Graph Isomorphism FFN (BGIFFN), it achieves a Kendall's Tau of \(0.877\)/\(0.890\) on NAS-Bench-101/201 and reduces the MAPE of latency prediction by 48-64%.
Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing: This paper proposes the VISA framework to debias video scene graph generation from both visual (Memory-Guided Sequence Modeling (MGSM) to reduce feature variance) and semantic (Iterative Relation Generator (IRG) to introduce hierarchical context and reduce dependence on biased priors) perspectives, significantly improving performance on tail categories on datasets like Action Genome.
Universal Scene Graph Generation: This paper proposes the Universal Scene Graph (USG) representation and its parser USG-Par, which generates a unified scene graph from arbitrary combinations of modalities (images, text, video, 3D) using a cross-modal object associator and text-centric scene contrastive learning, capturing both modality-invariant and modality-specific scene semantics.