🕸️ Graph Learning¶
📷 CVPR2026 · 8 paper notes
📌 Same area in other venues: 🔬 ICLR2026 (118) · 💬 ACL2026 (24) · 🧪 ICML2026 (35) · 🤖 AAAI2026 (37) · 🧠 NeurIPS2025 (54) · 📹 ICCV2025 (1)
🔥 Top topics: Multimodal/VLM ×3
- Adaptive Learned Image Compression with Graph Neural Networks
-
GLIC transforms nonlinear transformations in learned image compression (LIC) from fixed convolutions or window attention into content-adaptive connections driven by Graph Neural Networks (GNNs). It employs dual-scale graphs to determine "where to connect" and a complexity-aware mechanism to decide "how much to connect" to better model local and long-range redundancy. It significantly outperforms traditional codecs and recent LIC baselines across three standard datasets.
- Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs
-
This paper introduces Graph2Eval, a knowledge-graph-driven framework for the automatic generation of agent evaluation tasks. By constructing structured knowledge graphs from documents/webpages, performing subgraph sampling, LLM conditional generation, and multi-stage filtering, it automatically produces multimodal agent tasks with significantly improved semantic consistency (+20%) and solvability (+17%), resulting in the Graph2Eval-Bench containing 1,319 tasks.
- M3KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation
-
M3KG-RAG is proposed, which constructs a Multi-hop Multimodal Knowledge Graph (M3KG) via a lightweight multi-agent pipeline and designs the GRASP mechanism for entity grounding and selective pruning. It retains only query-relevant and answer-assisting knowledge, significantly enhancing the audio-visual reasoning capabilities of MLLMs.
- Mario: Multimodal Graph Reasoning with Large Language Models
-
Mario is proposed for LLM reasoning on Multi-Modal Graphs (MMGs). It achieves topology-aware cross-modal alignment via a Graph-conditioned Vision-Language Model (GVLM) and selects the optimal modality configuration for each node using a Modality-Adaptive Prompt Router (MAPR), reaching SOTA performance on node classification and link prediction.
- Mixture-of-Experts based Feature Decoupling for Open Vocabulary Scene Graph Generation
-
Addressing the issues of "relying solely on off-the-shelf VLM features, lacking discriminative attributes, and semantic isolation between objects and relations" in Open Vocabulary Scene Graph Generation (OVSGG), this paper proposes MoE-FD. It adaptively decouples object/relation features into sub-attributes like shape, texture, and space using a Mixture-of-Experts (MoE) module, followed by iterative cross-attention for mutual refinement between nodes and edges. On the Visual Genome all-open vocabulary setting, it significantly improves R@100 for novel categories (e.g., +4.24% R@20 over ACC in the OvD+R novel relation setting).
- R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII
-
Ours proposes R2G, the first standardized multi-view circuit graph benchmark suite, providing five stage-aware graph representations (with information equivalence) across 30 IP cores. Systematic research reveals that the choice of graph representation has a greater impact on performance than the choice of GNN model.
- Robo-SGG: Exploiting Layout-Oriented Normalization and Restitution Can Improve Robust Scene Graph Generation
-
Addressing the issue where "domain shift in visual features leads to a performance collapse" in robust Scene Graph Generation (inference on corrupted images with noise/blur/weather), this paper proposes a plug-and-play framework, Robo-SGG. It utilizes Instance Normalization to eliminate domain-specific statistics caused by corruption and uses layout-aware attention to recover global structural features (NRM). Additionally, it employs gated fusion to adaptively balance visual and coordinate features (LEE). Integrating these into existing SGG models yields relative improvements in mR@50 of 6.3% / 11.1% / 8.0% for PredCls/SGCls/SGDet on VG-C.
- ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning
-
Ours embeds the Procedural Knowledge Graph (PKG) into a planning model end-to-end via a differentiable Viterbi layer, allowing the neural network to focus on learning emission probabilities rather than memorizing complete procedural structures. This achieves SOTA success rates on CrossTask/COIN/NIV with only 5-7M parameters (1-3 orders of magnitude fewer than Diffusion/LLM methods) and establishes a unified evaluation benchmark.