🧬 Computational Biology¶

📷 CVPR2025 · 7 paper notes

📌 Same area in other venues: 📷 CVPR2026 (21) · 🔬 ICLR2026 (156) · 💬 ACL2026 (5) · 🧪 ICML2026 (52) · 🤖 AAAI2026 (20) · 🧠 NeurIPS2025 (76)

DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation: DiffVsgg is proposed to model Video Scene Graph Generation (VSGG) as an iterative denoising problem along the temporal axis. It unifies object classification, box regression, and relation prediction using a shared feature embedding. Through latent diffusion models for spatial reasoning and using prior-frame predictions as conditioning for temporal reasoning, it achieves the first online VSGG and accomplishes comprehensive SOTA performance across all three evaluation protocols on Action Genome, surpassing DSG-DETR by 3.3 points in R@10.
Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation: This paper proposes the ERBA adapter, which models enzyme kinetic prediction as a staged conditioning process of "substrate recognition \(\rightarrow\) conformational adaptation". It injects substrate semantics via MRCA, fuses active-site 3D geometry via G-MoE, and preserves PLM priors via ESDA, consistently outperforming existing methods on three kinetic endpoints: kcat, Km, and Ki.
Semantic and Expressive Variation in Image Captions Across Languages: This work systematically demonstrates significant distributional differences in semantic content (objects, relations, attributes) and expressive style (concreteness, tone, authenticity) in image captions across different languages. Multilingual caption sets provide richer visual information compared to monolingual ones (+46% objects, +66.1% relations, +66.8% attributes), providing empirical support for training vision models on multilingual data.
SHREC: A Spectral Embedding-Based Approach for Ab-Initio Reconstruction of Helical Molecules: This paper proposes the SHREC algorithm, which leverages the spectral embedding of the graph Laplacian to directly recover the projection angles of helical molecules from 2D cryo-EM projection images. Without requiring prior knowledge of helical symmetry parameters (rise/twist) and only requiring the axial point group symmetry \(C_n\), SHREC achieves near-atomic resolution ab-initio helical reconstruction on multiple public datasets.
Synthetic Visual Genome: Proposes the SVG (Synthetic Visual Genome) data engine. Through a two-stage pipeline consisting of completing missing relationships on top of existing human annotations via GPT-4 (Stage 1) and Robin self-distillation + GPT-4 editing (Stage 2/SG-Edit), it generates a dense scene graph dataset with 146K images, 2.6M objects, and 5.6M relationships. The trained Robin-3B model outperforms same-sized models trained on over 300M instances using less than 3M instances, achieving a state-of-the-art (SOTA) score of 88.9 on referring expression comprehension.
Towards Spatio-Temporal World Scene Graph Generation from Monocular Videos: This paper proposes the World Scene Graph Generation (WSGG) task and the ActionGenome4D dataset, upgrading video scene graphs from frame-centric 2D representations to world-centric 4D representations. It requires models to perform 3D localization and relation prediction in the world coordinate system for all objects, including invisible ones that are occluded or out of view. Three complementary methods (PWG/MWAE/4DST) are proposed to explore different inductive biases for invisible object reasoning.
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning: This work proposes Cobra, an unsupervised foundation model-agnostic (FM-agnostic) whole slide image (WSI)-level representation learning framework. It leverages embeddings from multiple pre-trained patch-level foundation models as feature-space augmentations, training a slide-level encoder using a Mamba-2 encoder and contrastive learning. Pre-trained on only 3,048 WSIs, Cobra outperforms existing slide encoders by at least +4.4% in average AUC across 15 downstream tasks.