ICLR2026 Computational Biology AI paper notes paper summaries Biomolecules Diffusion Models Alignment/RLHF Reasoning Multimodal/VLM Adversarial Robustness

🧬 Computational Biology¶

🔬 ICLR2026 · 156 paper notes

📌 Same area in other venues: 📷 CVPR2026 (21) · 💬 ACL2026 (5) · 🧪 ICML2026 (52) · 🤖 AAAI2026 (20) · 🧠 NeurIPS2025 (76) · 📹 ICCV2025 (4)

🔥 Top topics: Biomolecules ×66 · Diffusion Models ×24 · Alignment/RLHF ×7 · Reasoning ×5 · Multimodal/VLM ×5

3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations: The authors construct 3DCS, the first benchmark specifically designed to evaluate the representation sensitivity to "different conformations of the same molecule." Using >1M molecules and ~10M conformations covering geometry, chirality, and energy dimensions, paired with a Geometry–Chirality–Energy (GCE) evaluation framework, they reveal that modern 3D molecular representations are geometrically sensitive, erratic in capturing chirality, and largely fail to align with energy.
A Cross-Species Neural Foundation Model for End-to-End Speech Decoding: This paper proposes BIT, an end-to-end brain-computer interface (BCI) that translates cortical neural activity directly into full sentences. It utilizes a Transformer neural encoder pre-trained via cross-species (human + monkey) and cross-task self-supervised masked modeling. This encoder is then fine-tuned with contrastive alignment to an Audio LLM, reducing the Word Error Rate (WER) of previous end-to-end methods from 24.69% to 10.22% while setting a new SOTA on the Brain-to-Text '24/'25 benchmarks under a cascaded framework.
A Diffusion Model to Shrink Proteins While Maintaining Their Function: The authors propose SCISOR, a discrete diffusion model that learns only to "delete characters." It uses a pure birth process (random insertion) for forward noising and trains a denoiser to plan reverse deletions. This shrinks long protein sequences into shorter ones that are both "natural" and functional, achieving SOTA on ProteinGym deletion effect prediction.
A Foundation Model with Multi-Variate Parallel Attention to Generate Neuronal Activity: This paper proposes Multi-Variate Parallel Attention (MVPA), which decouples attention into content, time, and channel parallel components to ignore differences in channel quantity and arrangement. Using this, the authors build MVPFormer, the first open-source, open-weight, and open-data intracranial EEG (iEEG) foundation model, achieving expert-level SOTA in epilepsy detection and brain activity decoding.
A Genetic Algorithm for Navigating Synthesizable Molecular Spaces: SynGA is proposed as a genetic algorithm that operates directly on synthesis routes (synthesis trees). By using customized crossover and mutation operators, it strictly constrains the search to the synthesizable molecular space. Combined with ML-driven building block filtering, it achieves SOTA performance in synthesizable analog search and property optimization.
A Joint Diffusion Model with Pre-Trained Priors for RNA Sequence-Structure Co-Design: This work utilizes the pre-trained biomacromolecular structure prediction model RoseTTAFold2NA directly as a diffusion denoiser within a joint framework of "discrete sequence diffusion + SE(3) equivariant structure diffusion" (RiboDiff). With minimal RNA 3D data, it simultaneously generates RNA sequences and all-atom 3D conformations. In tasks involving single-stranded RNA, RNA-protein complexes, and protein-conditioned binding, self-consistency metrics significantly outperform diffusion/flow-matching baselines trained from scratch.
A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input: MethylProphet is a "gene context + DNA sequence" driven Transformer foundation model that completely eliminates the need for any measured methylation values as input. By utilizing only a single sample's gene expression profile and the local DNA sequence around each CpG site, it can infer genome-wide methylation levels (~28 million CpGs) and generalize to CpG sites and samples never seen during training.
A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame: InertialGenome utilizes an inertial frame to normalize initial 3D chromosome coordinates into a stable pose, then refines these coordinates using a Transformer equipped with 3D-RoPE and Nyström structural encoding. It outperforms traditional optimization methods and Graph Neural Network baselines across two single-cell Hi-C datasets, multiple resolutions, and various biological functional validations.
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex: This paper discovers that primate visual cortex V4 neurons do not just possess a "preferred stimulus" end; instead, they simultaneously exhibit preferred images that enhance firing and anti-preferred images that suppress baseline firing. Through electrophysiological validation, encoding models, psychophysical experiments, and the ImageBeagle search tool, the authors demonstrate that anti-preferred stimuli are an indispensable half for understanding V4 tuning.
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction: ALIGNED integrates data-driven neural networks with expert-curated gene regulatory knowledge within an Abductive Learning (ABL) framework. It utilizes a gradient-free adapter to decide whether to trust data or knowledge on a per-gene basis and subsequently refines the regulatory knowledge base using predictions. It achieves the highest "Balanced Consistency" across several large-scale perturbation datasets and re-discovers biologically meaningful regulatory relationships.
Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics: This paper proposes EGINTERPOLATOR: it first trains an equivariant diffusion structure model on large-scale static molecular conformation data, and then learns inter-frame correlations on a small amount of MD trajectories using a temporal interpolator, generating trajectories for small molecules, drug molecules, tetrapeptides, and protein monomers that more closely resemble real molecular dynamics.
Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining: BEAST utilizes a dual-objective of "masked autoencoding + temporal contrastive learning" to pretrain a ViT backbone on unlabeled behavioral videos collected from a single experimental setup. This single model outperforms specialized, heavily annotated models across three neuroethological tasks: neural encoding, pose estimation, and action segmentation.
Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence: The Antibody defense framework is proposed: during the alignment stage, flatness regularization forces the model into a flat region of harmful loss (small gradients \(\rightarrow\) difficult to attack); during the fine-tuning stage, a sample weighting scheme based on the model's safety knowledge (likelihood ratio of target completion vs. refusal) is used to suppress the learning of harmful samples. The average Harmful Score is reduced from \(15.29\%\) to \(7.04\%\).
AntigenLM: Structure-Aware DNA Language Modeling for Influenza: AntigenLM is a GPT-2 style DNA language model that preserves the integrity of genomic functional units. By pre-training and fine-tuning on whole genomes of influenza viruses, it autoregressively predicts antigen sequences of future dominant strains, significantly outperforming the evolutionary model beth-1 and general-purpose genomic models in amino acid mismatch rates.
Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs with Application to Glucose Prediction: Addressing the pain points of "excessive latent variables and overfitting on small datasets" when embedding mechanistic models into neural ODEs, this paper proposes a three-step hybrid graph sparsification algorithm, HGS (Merging Strongly Connected Components → Adding Shortcuts → L1/L2 Regularization for edge pruning). It automatically selects subgraphs that are both sparse and maintain mechanistic interpretability, achieving better and more robust predictions with fewer parameters on synthetic data and real-world T1D glucose prediction.
Automatic Image-Level Morphological Trait Annotation for Organismal Images: Sparse Autoencoders (SAEs) trained on foundation model features are utilized as "interpretable part detectors" to automatically localize biologically significant morphological structures in insect images. These localized regions are then processed by Multimodal Large Language Models (MLLMs) to generate trait descriptions, eliminating the need for manual expert annotation and resulting in the BIOSCAN-TRAITS dataset containing 80,000 trait annotations.
Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space: This paper embeds a temporal propagator, GLDP, into the pre-trained LD-FPG all-atom latent space, upgrading it from "only sampling static conformational ensembles" to "simulating conformational evolution over time." By conducting a fair comparison of three types of propagators (autoregressive neural network, Koopman linear operator, and score-guided Langevin) within the same frozen latent space, it concludes that autoregressive NNs are the most stable for long trajectories and most accurate for backbone dynamics; Langevin is the sharpest for side-chain thermodynamics; and Koopman serves as a lightweight but relatively rigid interpretable baseline.
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding: This paper proposes NRF (Neural Response Function), which transforms fMRI visual encoding from "regressing discrete voxel vectors for each subject" to "learning a continuous implicit function \(\Phi(M,x)\) in standard MNI anatomical space." By taking an image \(M\) and coordinates \(x=(x,y,z)\) as input to directly predict the brain response at that location, the model leverages local smoothing of voxels and cross-subject anatomical alignment. This allows the model to significantly outperform traditional encoding models in low-data scenarios (with only a few hundred images) and supports fine-tuning and transferring a pre-trained model from one subject to new subjects.
BioBO: Biology-informed Bayesian Optimization for Perturbation Design: BioBO integrates multimodal gene representations (Achilles + Gene2Vec + GenePT) into the surrogate model of Bayesian Optimization and utilizes enrichment analysis (EA) results as priors within the πBO framework to augment acquisition functions. This approach improves the labeling efficiency of CRISPR gene knockout screens by 25–40% while providing pathway-level interpretable design rationales.
BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation: BioMD is the first all-atom generative molecular dynamics model for protein-ligand systems. Using a hierarchical flow matching framework of "coarse-grained prediction + fine-grained interpolation," it compresses long-range trajectories (including ligand dissociation paths) that traditionally take hours of MD into tens of seconds, successfully reconstructing dissociation paths for 97.1% of systems in DD-13M.
CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization: CAPSUL constructs the first human protein benchmark (20,181 proteins) that provides both 3D structure information and 20 fine-grained subcellular localization labels. By evaluating 11 sequence/structure baselines under a unified framework, it demonstrates the necessity of 3D structures for localization prediction and discovers a decisive \(\alpha\)-helix localization pattern in the Golgi apparatus through attention visualization, aligning with experimental evidence.
CDBridge: A Cross-omics Post-training Bridge Strategy for Context-aware Biological Modeling: CDBridge proposes a "post-training bridge" strategy to connect pre-trained frozen DNA and protein models without re-training. Through a two-stage alignment involving "splicing-inspired adaptive token merging + tissue-conditioned decoder," the model achieves both qualitative functional alignment (DNA→Protein) and quantitative gene expression prediction across various tissue contexts for the first time.
CellDuality: Unlocking Biological Reasoning in LLMs with Self-Supervised RLVR: CellDuality organizes four types of single-cell biological reasoning tasks into a unified framework and utilizes "complementary task duality"—where the model forward-predicts a biological outcome and then reversely reconstructs the original input conditions from that outcome, using reconstruction fidelity as an intrinsic reward—to perform RLVR alignment without any ground-truth labels. This enables a 3B Llama model to achieve SOTA on tasks such as cell type annotation, drug sensitivity classification, and perturbation response generation, narrowing the gap with the "supervised RLVR oracle" by 35–56% on OOD perturbation prediction.
Clustering by Denoising: Latent Plug-and-Play Diffusion for Single-Cell Embeddings: Adapting "Plug-and-Play (PnP) diffusion denoising" to the single-cell context, DICE is proposed: it performs diffusion priors in a low-dimensional latent space for denoising while re-injecting noise into the original high-dimensional observation space to "steer" the sampling trajectory. This avoids the collapse issue where different cell types are crowded together in PCA latent space, allowing high-quality reference data to denoise noisier target data, significantly improving clustering and cell-type separability.
ConfHit: Conformal Generative Design with Oracle Free Guarantees: Ours proposes the ConfHit framework, which utilizes density ratio-weighted conformal permutation p-values to achieve "certification" (judging if a generated batch contains a hit) and "design" (refining candidate sets while maintaining statistical guarantees). ConfHit provides finite-sample \(1-\alpha\) coverage guarantees for generative molecular design without requiring experimental oracle validation and in the presence of distribution shifts.
Constrained Diffusion for Protein Design with Hard Structural Constraints: This work reinterprets constrained diffusion as "stochastic proximal optimization." By applying feasibility corrections to the predicted clean structure at each step and then re-noising back to the data manifold (predict-prox-renoise), and using ADMM to decouple local stereochemistry from global topological constraints, the method achieves 100% strict satisfaction of bond length and angle constraints in protein motif scaffolding and cavity design, with success rates far exceeding RFDiffusion-based baselines.
Continuous Multinomial Logistic Regression for Neural Decoding: This paper extends the classical multinomial logistic regression (MLR) from "finite discrete categories" to a "continuous output space," proposing CMLR: replacing discrete category weights with a set of smooth weight functions \(w_d(y)\) under Gaussian process priors. This maps neural population activity into a complete conditional probability density over continuous variables (orientation, position, velocity, etc.). Combined with stochastic variational inference in the Fourier domain, the model can be efficiently trained on a scale of tens of thousands of neurons, generally outperforming DNN, XGBoost, and FlexCode on data from mouse/monkey visual cortex, hippocampus, and motor cortex.
Controllable Diffusion-based Generation for Multi-channel Biological Data: Ours proposes MCD, a multi-channel diffusion framework that uses "random channel masking training + multi-resolution spatial condition injection + dual channel attention." This allows a single diffusion model to complete full channel panels under any combination of "observed/missing channels," achieving SOTA in spatial proteomics, single-cell gene-to-protein translation, and missing MRI modality synthesis.
Controllable Sequence Editing for Biological and Clinical Trajectories: Clef is proposed, a controllable sequence editing model based on "temporal concepts" capable of immediate and delayed editing of biological/clinical multivariate trajectories under given conditions (e.g., drugs, surgery). On cellular reprogramming and patient lab data, it improves immediate editing MAE by 16.28%, delayed editing by 26.73%, and zero-shot counterfactual generation by up to 62.84%.
Controlling Repetition in Protein Language Models: The authors provide the first systematic study of pathological repetition in Protein Language Models (PLMs). They propose a unified repetition metric \(R(x)\) and a utility metric \(U(x)\), and design the Utility-Controlled Contrastive Steering (UCCS) method. By injecting a steering vector decoupled from repetition into the hidden layers, the method effectively suppresses repetition while maintaining folding reliability without requiring model retraining.
Convex Efficient Coding: This paper reformulates a broad class of "neural representation optimization" problems (efficient coding, semi-non-negative matrix factorization, non-negative sparse coding, etc.) as convex optimization over the representation similarity matrix \(Q\) (the matrix of pairwise dot products of neural responses). This approach maintains the flexibility of deep networks while regaining the analyzability of linear models. It provides the first necessary and sufficient condition for the identifiability of semi-NMF, offers a theoretical justification for single-neuron tuning analysis, and explains the sparsity threshold for retinal ON-OFF coding.
Count Bridges enable Modeling and Deconvolving Transcriptomic Data: Ours proposes Count Bridges—a stochastic bridge model defined on the integer lattice \(\mathbb{Z}^d\) driven by Poisson birth-death processes, providing an exact analytically tractable counterpart of diffusion models for count data. By incorporating "aggregation-only" deconvolution into the same framework via EM, it achieves SOTA results in synthetic distribution matching, nucleotide-level deconvolution of bulk RNA-seq, and spot deconvolution of spatial transcriptomics.
Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics: CTAE employs a pair of (or multiple) coupled causal Transformer autoencoders to simultaneously model neural population activity across multiple brain regions. It explicitly partitions the latent space of each region into orthogonal "cross-region shared" and "region-private" subspaces. By utilizing four loss functions to force inter-regional signals into the shared block and retain region-specific signals in the private block, it cleanly separates shared and private components while preserving non-stationary, non-linear temporal dynamics. Downstream linear decoders achieve higher accuracy in decoding behavioral variables compared to linear methods like DLAG/mDLAG.
CP-Agent: Context-Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations: CP-Agent integrates an experimental context-aware image-text alignment module (CP-CLIP) with a multi-agent MLLM reasoning pipeline. Starting from a pair of Cell Painting microscopy images, it automatically retrieves experimental background, segments and extracts single-cell morphological features, statistically compares perturbed vs. control groups, and generates traceable, interpretable Mechanism of Action (MoA) reports.
CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models: CryoLVM introduces the Joint-Embedding Predictive Architecture (JEPA) and a SCUNet backbone to the domain of 3D cryo-EM density maps. It performs self-supervised pre-training in representation space using 7,302 real experimental maps from EMDB, combined with a novel histogram distribution alignment loss for fine-tuning. It consistently outperforms specialized methods like DeepEMhancer, EMReady, EM-GAN, and IsoNet across three downstream tasks: sharpening, super-resolution, and missing wedge completion.
CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints: This work proposes CryoNet.Refine, the first AI-based cryo-EM atomic model refinement framework. It designs a one-step diffusion model (initialized with Boltz-2 weights) incorporated with an innovative differentiable density generator for physical simulation. By introducing density map correlation as a differentiable loss function (cosine similarity) combined with geometric constraints (Ramachandran, Rotamer, bond angles), it employs a test-time optimization strategy for case-specific refinement. It comprehensively outperforms Phenix.real_space_refine across 120 protein and DNA/RNA complexes (CC_mask 0.59 vs 0.54, Ramachandran favored 98.92%).
CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction: CryoSplat transforms 3D Gaussian Splatting (3DGS) into a differentiable renderer compliant with cryo-EM imaging physics. Using an anisotropic Gaussian Mixture Model (GMM), it achieves stable cryo-EM homogeneous reconstruction directly from raw noisy particle images starting from random initialization—without requiring any external consensus maps or atomic models. It outperforms cryoSPARC and cryoDRGN in resolution across four real datasets while maintaining superior memory and speed efficiency.
DCFold: Efficient Protein Structure Generation with Single Forward Pass: DCFold simultaneously distills the two major iterative bottlenecks of AlphaFold3 (multi-step diffusion and Pairformer recycling) using "dual consistency." Combined with a Temporal Geodesic Matching (TGM) scheduler designed for variable-length protein sequences, it achieves AlphaFold3-level structure prediction accuracy in a single forward pass, providing approximately 15× inference acceleration (average 133s → 9s).
Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining: POYO-CAP treats "statistical regularity" (measured by skewness and kurtosis) as an explicit data filtering criterion. It performs masked reconstruction pretraining starting with the most "predictable" neurons (e.g., inhibitory interneurons) and then transfers to noisy neurons for downstream decoding. This transforms neural heterogeneity in calcium imaging from a bottleneck into a scalable learning advantage—achieving a movie frame reconstruction SSIM of 0.593, a 1.98× increase in data efficiency, and stable performance scaling with model size.
DeepSADR: Deep Transfer Learning with Subsequence Interaction and Adaptive Readout for Cancer Drug Response Prediction: DeepSADR models "drug-patient response" as a bipartite interaction graph between drug substructures and gene functional subsequences. It employs Graph Autoencoders and Adaptive Readout via Set Transformers to transfer rich response knowledge from cell lines to label-scarce clinical patient data, achieving an average AUC of 0.856 and AUPR of 0.862 across 5 clinical drugs.
Diffusion Alignment as Variational Expectation-Maximization: This paper formalizes diffusion model alignment as a variational EM algorithm: the E-step uses test-time search (soft Q guidance + importance sampling) to explore high-reward multimodal trajectories, and the M-step distills search results into model parameters through forward-KL, achieving both high reward and high diversity in image generation and DNA sequence design.
Discovering heterogeneous synaptic plasticity rules via large-scale neural evolution: This paper constructs the mouse primary visual cortex (V1) as a plastic spiking neural network. By utilizing a multi-objective evolutionary algorithm to search for individual learning rules for different synapse types within a vast interpretable rule space composed of spikes, eligibility traces, and reward prediction error signals, researchers discovered that various mathematically distinct rules can simultaneously maintain biological plausibility, visual change detection capabilities, few-shot adaptability, and generalization across network scales.
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning: To address the issue where reward-proportional sampling in GFlowNets is overwhelmed by a massive number of sub-optimal objects in exponentially large search spaces, this paper proposes a general mellowmax operator that unifies soft Bellman, mellowmax, and soft mellowmax (interpolating between "accumulation" and "dilution" biases via parameter \(q\)). Based on this, it derives TGM, a simple trajectory-level algorithm that identifies higher-reward yet diverse candidates in real biological sequence design tasks (DNA/Protein) compared to GFN/PPO/SAC.
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition: Ours proposes SDPO (Stepwise Decomposition Preference Optimization), which decomposes the trajectory alignment problem of discrete diffusion models into stepwise posterior alignment sub-problems. This avoids the difficulty of backpropagating gradients through the entire denoising chain and significantly outperforms existing methods in DNA sequence design, protein inverse folding, and language modeling.
Distilling Causal Signals for One-Shot Directed Evolution of Antibodies: AFFINITYENHANCER proposes antibody affinity maturation in an extreme "one-shot" setting: "given only a single lead antibody sequence, no antigen information, no fine-tuning, and no antigen-antibody complex structures." By constructing "same-antigen, low-affinity → high-affinity" neighbor pairs within cross-antigen datasets, a residual Graph Transformer learns a mapping in a frozen sequence-structure embedding space to "push low-affinity embeddings toward high-affinity ones." It theoretically proves that this paired supervision is dominated by causal changes, keeping spurious shifts within a small budget, thereby generalizing to completely unseen antibody seeds and concentrating mutations on the paratope interface rim, outperforming structure-conditioned Inverse Folding (AntiFold) and sequence inpainting (IgCraft) baselines.
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials: Ours proposes DistMLIP, a distributed inference platform based on a zero-redundancy graph-level parallelization strategy. It addresses the lack of multi-GPU support in existing machine learning interatomic potentials (MLIPs), achieving simulations of nearly one million atoms on 8 GPUs. This approach is up to 8x faster and can simulate systems 3.4x larger than traditional spatial partitioning methods.
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation: Doloris utilizes two conditional diffusion models sharing a Gaussian latent space to model the distributions of "unperturbed cells" and "perturbed cells" respectively. By leveraging Dual Diffusion Implicit Bridges (DDIB), it bypasses the inherent challenge of unpaired single-cell sequencing data—where the same cell cannot be measured both before and after perturbation. Coupled with a sparsity masking model that specifically predicts gene silencing, it directs the diffusion model's capacity toward expressed genes, achieving SOTA performance on genetic and molecular perturbation datasets while preserving the diversity of single-cell responses.
DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models: DriftLite proposes leveraging the degree of freedom between the drift and potential functions in the Fokker-Planck equation. By solving for an optimal control drift via a lightweight linear system to proactively stabilize particle weights, it addresses weight degradation in Sequential Monte Carlo with minimal cost, significantly outperforming Guidance-SMC baselines in Gaussian mixture, molecular system, and protein-ligand co-folding tasks.
Drugging the Undruggable: Benchmarking and Modeling Fragment-Based Screening: To address the failure of traditional molecular screening on "undruggable" proteins (shallow, transient, or cryptic pockets), this paper constructs FragBench, the first fragment-level virtual screening benchmark (comprising 54 challenging targets annotated via multi-agent LLM and human collaboration). It proposes FragCLIP, a tri-modal contrastive learning framework that jointly encodes pockets, full molecules, and fragments. FragCLIP significantly outperforms docking software and existing ML methods (improving [email protected]% on FragBench from 1.86 for Glide to 6.85), and its retrieved fragments can be effectively extended or linked into high-affinity lead compounds.
DrugTrail: Interpretable Drug Discovery via Structured Reasoning and Druggability‑Tailored Preference Optimization: DrugTrail transforms general large language models into drug designers that "think like medicinal chemistry experts." It employs Clinical Chemistry-Informed Reasoning (CCIR) for lightweight SFT, followed by Druggability-Tailored Preference Optimization (DTPO) via GRPO—an online-computable reinforcement learning approach that bypasses time-consuming docking scores. This allows 7B-level models to outperform large-scale models like DeepSeek-R1 in pocket-oriented molecule generation across metrics such as docking energy, QED, and SA, while providing readable reasoning chains for every molecule.
Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement: HIERAFOLD uses PAE to automatically segment rigid subunits and cross-chain interfaces from coarse-grained pairwise predictions, performs high-precision refinement only on "focal chain + relevant interface subunits," and finally assembles them via confidence-weighted alignment. This reduces the peak VRAM of large protein complexes to a runnable range while maintaining accuracy close to AlphaFold3.
Enhancing Diffusion-Based Sampling with Molecular Collective Variables: This paper integrates the concept of "well-tempered metadynamics" from molecular dynamics—applying online repulsive biases along collective variables (CVs)—into the state-of-the-art diffusion sampler ASBS to create WT-ASBS. During training, it continuously accumulates biases along low-dimensional CVs to force the discovery of rare conformations; during inference, the bias is reweighted to restore the Boltzmann distribution. This marks the first time a diffusion sampler has characterized reaction surfaces involving bond breaking/formation with wall-clock times significantly lower than metadynamics.
Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions: The authors propose DeMol, a dual-graph enhanced multi-scale interaction framework. By utilizing parallel atom-centric and bond-centric graph channels along with Double-Helix Blocks, the model explicitly accounts for atom-atom, atom-bond, and bond-bond interactions, achieving SOTA results on benchmarks including PCQM4Mv2, OC20, and QM9.
Exploring Synthesizable Chemical Space with Iterative Pathway Refinements: ReaSyn models "finding synthesizable analogs for a given molecule" as a search/inference problem. It utilizes a single autoregressive Transformer to support both bottom-up and top-down synthesis tree generation, overlaid with a global discrete flow editor (Edit Bridge). Through an iterative cycle of "bottom-up decoding \(\rightarrow\) top-down decoding \(\rightarrow\) global editing," it significantly improves the coverage and reconstruction rates within the synthesizable chemical space.
Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction: The study challenges the "longer is better" paradigm in long-sequence modeling for gene expression prediction, discovering that current SSM models essentially utilize only proximal information. It further identifies background chromatin signals (DNase-seq/Hi-C) as confounding variables that introduce spurious correlations and proposes the Prism framework to perform deconfounding via backdoor adjustment, outperforming 200k-sequence SOTA models using only 2k short sequences.
FACET: A Fragment-Aware Conformer Ensemble Transformer: FACET uses a differentiable Graph Transformer to learn an approximation of the expensive Fused Gromov-Wasserstein (FGW) distance, transforming "geometry-aware multi-conformer aggregation" from an online optimization problem into a single forward pass. Combined with fragment-level structural priors, it achieves a 5–6x training speedup while maintaining SOTA accuracy, scaling effectively to 75,000 molecules.
Fast and Interpretable Protein Substructure Alignment via Optimal Transport: PLASMA reformulates protein local structure alignment as an entropy-regularized optimal transport problem. Using differentiable Sinkhorn iterations, it directly outputs a residue-level alignment matrix and an interpretable similarity score in \([0,1]\). It achieves high speed (~10ms/pair, 50× faster than TM-align) and high accuracy for aligning active/binding sites.
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization: RaftPPI approximates traditional residue-residue protein interaction scoring as decomposable single-protein embedding inner products. By utilizing Gaussian kernels, SORF random Fourier features, and low-rank attention, it preserves residue-level modeling capabilities while reducing the time required for candidate interaction retrieval across the entire human proteome from GPU-months to a few minutes on a single machine.
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping: This paper unifies rejection sampling fine-tuning methods into the GRAFT framework and proves that it implicitly performs KL-regularized reward maximization. Consequently, it proposes P-GRAFT to perform distribution shaping at intermediate denoising steps (achieving a better bias-variance trade-off) and Inverse Noise Correction to improve flow model quality without rewards, achieving an 8.81% VQAScore improvement on T2I tasks.
FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling: FlexRibbon bidirectionally couples amino acid sequences and 3D structures during pretraining using "Masked Language Modeling + Diffusion Denoising." Without relying on MSAs, it refreshes SOTA performance across 12 tasks—including antibody/nanobody CDRs, peptide interfaces, protein-ligand docking, and functional annotation—significantly outperforming MSA-based methods like AlphaFold in high-mutation and low-homology scenarios.
Flow Autoencoders are Effective Protein Tokenizers: This paper proposes Kanzi—a non-equivariant protein structure tokenizer trained with a flow matching loss. By using a diffusion decoder and an FSQ quantization bottleneck to replace the traditional SE(3)-invariant modules and complex loss functions, it achieves SOTA reconstruction with 1/20th the parameters and 1/400th the training data.
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching: FragFM elevates molecular generation to the level of "chemical fragments": it employs discrete flow matching for sampling on fragment-level graphs, followed by a coarse-to-fine autoencoder for lossless reduction to the atomic level. Combined with a "random fragment bag" strategy to bypass fixed vocabulary constraints, it generates larger, more realistic, and controllable molecules with fewer denoising steps.
Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology: This paper proposes the Stamp framework, which utilizes spatial transcriptomics (ST) gene expression data as supervisory signals. Through spatially-aware gene encoder pretraining and hierarchical multi-scale contrastive alignment, it achieves joint representation learning of pathological images and ST data, reaching State-of-the-Art (SOTA) performance across 4 downstream tasks on 6 datasets.
GAGA: Gaussianity-Aware Gaussian Approximation for Efficient 3D Molecular Generation: GAGA discovers that 3D molecular data reaches a "sufficiently Gaussian" state much earlier than images during the forward noising process. By using statistical tests to locate this characteristic timestep \(T^*\) and replacing the subsequent redundant trajectory with a closed-form Gaussian approximation, GAGA accelerates both training and sampling while improving generation quality—all without changing the architecture or noise schedule.
GeomMotif: A Benchmark for Arbitrary Geometric Preservation in Protein Generation: GeomMotif decouples the protein motif scaffolding task from "functional sites," constructing 57 guaranteed solvable, modality-agnostic "pure geometric preservation" tasks. Through a unified SUN (Success × Unique × Novel) metric system, it reveals counterintuitive phenomena, such as structural models significantly outperforming sequence models and structural conditioning potentially interfering with generation.
GRAM-DTI: Adaptive Multimodal Representation Learning for Drug-Target Interaction Prediction: GRAM-DTI integrates drug SMILES, molecular text, hierarchical taxonomic annotations (HTA), and protein sequences into a unified pre-training framework. It utilizes Gramian volume alignment, adaptive modality dropout, and IC50 weak supervision to learn robust drug-target representations, overall surpassing strong baselines in DTI / MoA prediction and zero-shot retrieval.
Graph Diffusion Transformers are In-Context Molecular Designers: By using "molecule-score" demonstration pairs as surrogates for text prompts to define task context, this work trains a 0.7B molecular foundation model, DemoDiff, based on a Graph Diffusion Transformer. It matches or exceeds the performance of Large Language Models (LLMs) that are 100–1000× larger across 33 design tasks using only a few dozen in-context examples.
Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models: This paper introduces the Magneton environment (including a dataset of 530,000 proteins and 1.7 million substructure annotations, a training framework, and 13 benchmark tasks) and substructure-tuning, a model-agnostic supervised fine-tuning method. It explicitly injects the biological prior that "proteins are assembled from evolutionarily conserved recurring substructures (domains, active sites, etc.)" into pre-trained protein encoders, systematically improving performance on function-related tasks without relying on global structure inputs.
h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network: This paper proposes OverlapBPE, an overlapping molecular tokenization algorithm, along with h-MINT, a hierarchical molecular interaction network. By utilizing many-to-many mappings where "fragments can share atoms," it preserves chemical contexts such as aromaticity, chirality, and charge, outperforming existing state-of-the-art methods in affinity prediction, virtual screening, and high-throughput screening.
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data: HEIST models tissue as a two-layer hierarchical graph consisting of a "spatial cell graph + intracellular gene co-expression networks." Through cross-level directed attention, gene representations are modulated by the spatial microenvironment while cell representations are shaped by internal transcriptional states. This approach overcomes fixed gene vocabularies, enables zero-shot migration to proteomics, and sets new SOTA performance in clinical prediction, cell annotation, and gene imputation tasks.
HeurekaBench: A Benchmarking Framework for AI Co-scientist: Ours proposes HeurekaBench, a framework for building evaluation benchmarks based on real-world scientific workflows. It extracts verifiable scientific insights from papers through a multi-LLM pipeline and generates open-ended research questions to evaluate the end-to-end capabilities of AI co-scientists in data-driven discovery.
Hierarchical Multi-Scale Molecular Conformer Generation: MSGEN decomposes molecular conformer generation into a multi-stage hierarchical process from "coarse scaffold to fine atoms." It utilizes the positions of key substructures generated in previous stages as geometric guidance and incorporates "molecular upsampling" that respects chemical connectivity to bridge scale gaps. This plug-and-play framework enables various generative models like GeoDiff, ET-Flow, and EBD to produce more stable and chemically reasonable conformations.
I2Mole: Interaction-aware Invariant Molecular Learning for Generalizable Drug-Drug Interaction Prediction: I2Mole merges pairs of drug molecules into a "merged graph," first using attention to model cross-molecular interactions between atoms, then employing an improved Graph Information Bottleneck (GIB) to extract decisive core substructures (rationales). It utilizes vector quantization to cluster training environments into an "environment codebook" as a controllable noise source for invariant learning, achieving robust drug-drug interaction predictions under both inductive settings and cross-domain distribution shifts.
Interpolation-Based Conditioning of Flow Matching Models for Bioisosteric Ligand Design: Based on pre-trained E(3)-equivariant flow matching molecular generation models, this work proposes two inference-only conditioning strategies requiring zero retraining—Interpolate–Integrate (soft global similarity) and Replacement Guidance (hard local anchoring)—to enable 3D bioisostere design conditioned on reference ligands or fragment sets.
Intrinsic Lorentz Neural Network: The paper proposes ILNN, a fully intrinsic hyperbolic neural network where all operations are conducted within the Lorentz model. This eliminates the geometric inconsistency found in existing methods that mix Euclidean operations, achieving SOTA results in image classification, genomics, and graph classification.
Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design: VIDD reformulates "fine-tuning diffusion models with rewards" as offline policy distillation: using a soft-optimal policy as a teacher, it is distilled into a student model by minimizing forward KL (value-weighted MLE). This achieves more stable and efficient reward optimization than PPO-based RL methods for biomolecular design tasks (proteins, DNA, small molecules) involving non-differentiable rewards.
KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction: KGOT models "pseudo-labeling unannotated molecule-protein pairs" as an Optimal Transport (OT) matching problem. The generated transport plan is then written back as a new relation into a large-scale biological Knowledge Graph (KG) for joint training. This closed loop of OT + KG effectively mitigates label scarcity in MPI tasks, comprehensively outperforming docking and DrugCLIP in both virtual screening and link prediction.
La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching: La-Proteina utilizes a "partially latent" representation—explicitly modeling \(\alpha\)-carbon coordinates while compressing other atomistic details and sequences into a fixed-dimension per-residue latent variable. This approach transforms the mixed discrete-continuous and variable-dimension challenge of all-atom protein modeling into a pure continuous, fixed-dimension problem. By applying flow matching to jointly generate sequences and all-atom structures, it achieves SOTA performance in all-atom co-designability, diversity, and structural plausibility, scaling effectively to proteins up to 800 residues.
Learning Brain Representation with Hierarchical Visual Embeddings: This work constructs a "hierarchical visual representation" as an alignment target by combining multiple pre-trained visual encoders with different inductive biases (CLIP semantics + VAE pixels). A Fusion Prior, pre-trained on large-scale images, is employed to stably map fused features to diffusion conditions. This allows EEG/MEG brain signals to align with both high-level semantics and low-level pixels, balancing zero-shot retrieval accuracy and reconstruction fidelity.
Learning Collective Variables from BioEmu with Time-Lagged Generation: The frozen protein generation foundation model BioEmu is re-purposed as a "time-lagged generator" — by providing it the current conformation \(x_t\) and forcing it to generate the conformation \(x_{t+\tau}\) after time \(\tau\), a lightweight encoder is trained to automatically learn a 1D CV that encodes only slow degrees of freedom. These CVs can be directly applied to enhanced sampling methods such as OPES and Steered MD.
Learning Explicit Single-Cell Dynamics Using ODE Representations: This paper proposes Cell-MNN—an encoder-decoder architecture that represents single-cell differentiation dynamics as "state-conditioned locally linear ODEs." This approach discards expensive Optimal Transport (OT) preprocessing and multi-stage training, achieving SOTA average performance on single-cell interpolation benchmarks through an end-to-end single-stage process. It simultaneously produces interpretable gene regulatory interactions validated against the TRRUST database.
Learning Flexible Forward Trajectories for Masked Molecular Diffusion: This paper discovers that directly applying Masked Diffusion Models (MDM) to molecular graph generation leads to severe degradation due to "state-clashing," where different molecules collapse into the same intermediate state during forward noise addition. The authors propose MELD, which uses a learnable noise schedule network to assign unique masking rates to each atom/bond, effectively staggering forward trajectories. This achieves 100% chemical validity and SOTA distribution alignment on QM9 and ZINC250K.
Learning Molecular Chirality via Chiral Determinant Kernels: The paper proposes Chiral Determinant Kernels (ChiDeK) to encode SE(3)-invariant chiral matrices, unifying central and axial chirality within a GNN framework for the first time. By combining this with cross-attention to propagate stereochemical information, it achieves a >7% accuracy improvement on a newly constructed axial chirality benchmark.
Learning Residue Level Protein Dynamics with Multiscale Gaussians: DYNAPROT models protein dynamics as a "multivariate Gaussian distribution over Cα coordinates on a static structure." It utilizes a lightweight SE(3)-invariant network to directly predict per-residue 3×3 marginal covariances and residue-pair N×N scalar couplings from a single static structure. A heuristic then assembles the full 3N×3N joint covariance, achieving fast and interpretable flexibility prediction and conformational ensemble sampling with three orders of magnitude fewer parameters.
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs: This paper systematically validates a counter-intuitive conclusion: instead of forcing Scientific LLMs (Sci-LLMs) to directly "read" raw biomolecular sequences, it is more effective to use mature bioinformatics tools like BLAST, Pfam, or GO to preprocess sequences into high-level, human-readable text contexts. Providing "Context-only" significantly outperforms "Sequence-only" in protein QA tasks, and feeding both raw sequences and context together actually degrades performance, suggesting that the true value of existing Sci-LLMs lies in being "knowledge reasoning engines" rather than "sequence decoders."
Low rank adaptation of chemical foundation models generate effective odorant representations: This paper first uses a large-scale benchmark to prove that representations generated by off-the-shelf chemical foundation models are not stronger than manual physicochemical descriptors (due to high information redundancy). It then proposes LORAX—using LoRA to fine-tune chemical foundation models for olfactory tasks with cross-attention and XGBoost ensembles—to create odorant representations that are better aligned with neural representations and generalize more effectively.
MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models: Instead of learning frame-by-frame MD transition densities with a fixed lag time, this work first uses Markov State Models (MSM) to coarse-grain trajectories into discrete metastable states, then employs Flow Matching to learn "state-to-state" jump distributions. This approach replaces molecular dynamics sampling with a two-order-of-magnitude speedup and enhanced capability for exploring rare large conformational changes.
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes: This work utilizes Bayesian meta-learning to automatically distill "black-box" normative theories (e.g., efficient coding in the retina) into a Deep Kernel Gaussian Process prior (Theory-Informed Kernel). This serves as an inductive bias to improve fitting on real neural data and allows for the rigorous quantification of "theory-data alignment" using exact marginal likelihood.
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation: This paper introduces the concept of "Micro-World Simulation" for the first time. It constructs a fine-grained rubric benchmark (MicroWorldBench), an expert-verified dataset (MicroSim-10K), and fine-tunes a micro-scale video generation model (MicroVerse) based on Wan2.1. It reveals and bridges the gap where current SOTA video models appear "visually plausible but physically/biologically incorrect" in simulating micro-biological mechanisms.
MindPilot: Closed-loop Visual Stimulation Optimization for Brain Modulation with EEG-guided Diffusion: MindPilot treats the human brain as a non-differentiable black-box function. By using non-invasive EEG signals as optimization feedback paired with a "pseudo-model" to provide surrogate gradients, it iteratively generates or retrieves natural images to drive neural states toward specified targets. This work validates the feasibility of "reverse-modulating the brain with images" across both semantic and spectral neural objectives for the first time.
Model-Guided Microstimulation Steers Primate Visual Behavior: A topographic deep visual model is used to rehearse microstimulation experiments "in silico," identifying stimulation sites and images most likely to alter behavior. These predictions were then validated in the inferior temporal (IT) cortex of live macaques. The results show a significant correlation between model-predicted behavioral shifts and the monkeys' actual choices, achieving the first model-in-the-loop guided stimulation of the high-level visual cortex.
MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning: MolEditRL performs molecular editing directly on discrete molecular graphs: it employs graph-text conditional diffusion to learn target molecule reconstruction from source molecules and natural language instructions, followed by property optimization via reinforcement learning with structural constraints. This approach simultaneously improves editing success rates, structural similarity, and chemical distribution quality using fewer parameters.
Multi-Marginal Flow Matching with Adversarially Learnt Interpolants: This paper uses a GAN-style adversarial loss to learn "neural interpolant curves," forcing the marginal distributions of the curves at intermediate time points to approximate the observed snapshot distributions (rather than passing through samples point-wise). These smooth interpolants are then marginalized into a vector field via Flow Matching to infer continuous dynamics from discrete time snapshots lacking ground-truth trajectories in scientific data.
Multi-state Protein Sequence Design with DynamicMPNN: DynamicMPNN is the first "explicit" multi-state inverse folding model that directly learns the joint conditional distribution \(p(Y|X_1,\dots,X_m)\) for a single sequence across multiple conformations. It improves the sequence recovery of ProteinMPNN by 12% and decoy-normalized RMSD self-consistency by 31% on multi-state protein benchmarks.
Multifidelity Simulation-based Inference for Computationally Expensive Simulators: The authors propose MF-(TS)NPE: pre-training a neural density estimator using cheap low-fidelity simulations followed by fine-tuning with a small number of expensive high-fidelity simulations, reducing the required high-fidelity simulation budget for Bayesian inference by up to two orders of magnitude.
Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster: Constructs the first anatomically and physiologically accurate 3D musculoskeletal model for the Drosophila leg (OpenSim + MuJoCo dual engines). It bridges motor neuron activity with joint movement using Hill-type muscles, infers muscle synergies from real behavioral data, and demonstrates that passive joint properties (stiffness/damping) accelerate the learning of muscle-driven control.
NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction: This paper constructs the first standardized benchmark for RNA non-canonical (NC) base-pair prediction, NC-Bench (925 sequences, 6708 NC annotations), and proposes NCfold—a dual-branch closed-loop framework. By utilizing IsoScore to select RNA foundation model (RFM) embeddings and injecting structural priors via Representative Embedding Fusion (REF) into attention, NCfold significantly outperforms traditional machine learning and RFM baselines in NC edge type and orientation prediction.
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens: OmniMouse adopts a unified architecture with single-neuron tokens and flexible masking, jointly performing neural prediction/forecasting, behavior decoding, and stimulus encoding on over 150 billion neural tokens from 73 mouse visual cortices, achieving new SOTA results. It yields a counter-intuitive scaling conclusion: brain activity modeling is currently data-limited rather than parameter-limited—increasing data yields continuous gains, while model scale saturates quickly.
One Protein Is All You Need: This paper proposes ProteinTTT, which applies Test-Time Training (TTT) to protein language models. Given a target protein sequence, the backbone network undergoes several dozen steps of self-supervised fine-tuning using a masked language modeling objective on that single sequence before inference. This process reduces the model's perplexity and improves representations, thereby enhancing structure, fitness, and functional predictions—without modifying any downstream task heads—and achieving new SOTA results on ProteinGym.
Only Brains Align with Brains: Cross-Region Alignment Patterns Expose Limits of Normative Models: The authors point out that existing "model-brain alignment" benchmarks only perform pointwise (ROI-layer) scoring and suffer from extremely low discriminability (where many architecturally diverse visual models have indistinguishable scores). They propose Alignment Pattern Analysis (APA)—mapping the alignment of each brain region relative to all other brain regions as a "fingerprint" curve. This requires models not only to achieve high scores on individual ROIs but also to replicate this cross-regional relationship curve. Results reveal that even top-ranked models like V-JEPA 2 fail to match these patterns, highlighting that "high alignment scores \(\neq\) truly brain-like."
Optimal Transport Unlocks End-to-End Learning for Single-Molecule Localization: To address the dependency of deep learning-based Single-Molecule Localization Microscopy (SMLM) on non-differentiable NMS in high-density scenarios, this paper reformulates the training objective as a set matching problem between "predicted emitters" and "ground truth." By utilizing entropy-regularized Optimal Transport (Sinkhorn) to construct a differentiable loss, the authors completely replace NMS. Coupled with an iterative refinement network that incorporates microscope imaging physics as feedback, the method achieves new SOTA performance on both synthetic benchmarks and real biological data in medium-to-high density regions.
Pallatom-Ligand: an All-Atom Diffusion Model for Designing Ligand-Binding Proteins: Pallatom-Ligand utilizes an all-atom diffusion transformer to directly learn the joint distribution of all atoms in "protein + small molecule ligand" complexes. It simultaneously generates the protein backbone, side chains, and ligand pockets end-to-end, supporting programmable control over global protein folding (\(\alpha/\beta\) ratio) and ligand solvent accessibility, achieving the highest in silico success rate across a comprehensive benchmark of eight ligands.
PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA: PatchDNA adapts the "patching" mechanism of the Byte Latent Transformer from NLP to DNA. It uses evolutionary conservation scores (PhyloP) instead of a fixed vocabulary to determine variable-length patch boundaries and supports "re-patching" post-training. This allows models with an order of magnitude fewer parameters to outperform existing SOTA on multiple genomic benchmarks and adjust slicing strategies by downstream task or cell type without retraining.
PepBenchmark: A Standardized Benchmark for Peptide Machine Learning: PepBenchmark integrates 35 canonical/non-canonical peptide datasets, a unified cleaning-sampling-splitting pipeline, and a leaderboard for four categories of models into a single reproducible experimental framework, revealing the true performance boundaries of PLM, fingerprint, GNN, and SMILES models across different peptide tasks.
PepTri: Physical, Evolutionary, and Mutual Information Tri-guided All-atom Diffusion Peptide Design: PepTri performs joint diffusion generation of peptide sequences and 3D structures within an SE(3)-equivariant latent space. By injecting physical, evolutionary, and mutual information tri-guidance during the denoising process, it ensures the generated peptides are physically stable, evolutionarily plausible, and sequence-structure consistent, achieving SOTA performance across multiple peptide-protein design benchmarks.
PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction: PETRI treats a batch of cells with the same perturbation as a "multimodal document," using an early-fusion Transformer to perform joint reconstruction of masked images and transcriptomes. It learns unified cell embeddings without requiring cell-level pairing and significantly outperforms unimodal and late-fusion baselines in recovering known gene relationships.
Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection: TBD
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification: PoinnCARE projects enzyme sequences, structures, and active site modalities into hyperbolic (Poincaré ball) space for joint encoding and alignment. It utilizes graph diffusion to complete sparse active site annotations and leverages hyperbolic geometry to faithfully preserve the tree-like hierarchy of the EC numbering system. It outperforms 12 SOTA methods across four test sets of the CARE benchmark, leading CLEAN by up to 10.4% in level-4 EC number prediction accuracy.
PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking: PoseX constructs an open-source docking benchmark covering both self-docking and the more realistic cross-docking scenarios. Utilizing 718 + 1312 new crystal structures free of training leakage, evaluation of 23 docking methods across three major categories, a meticulously designed energy relaxation post-processing pipeline, and a real-time leaderboard, it systematically demonstrates that AI methods have comprehensively outperformed traditional physical docking software in the more challenging real-world task of cross-docking.
Pretraining with Re-parametrized Self-Attention: Unlocking Generalization in SNN-Based Neural Decoding Across Time, Brains, and Tasks: This paper proposes RAT SNN—a lightweight spiking neural network that integrates "re-parameterized spiking self-attention + multi-timescale spiking neurons + multi-stage cross-condition pre-training." Designed to decode motor intent from cortical spike trains, it achieves accuracy comparable to mainstream ANN decoders with only 600,000 parameters and pure addition (AC) operations during inference. It enables rapid generalization across time, subjects, and tasks, targeting the strict power constraints of fully implantable brain-computer interfaces (fully iBMI).
PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations: PRISM introduces "Retrieval-Augmented Generation (RAG)" into protein inverse folding: it retrieves fine-grained structure-sequence motif representations for each residue from a database of known proteins, then utilizes a hybrid self/cross-attention decoder to integrate these local fragments into the backbone context. This pushes SoTA perplexity and amino acid recovery rates higher with minimal additional inference overhead (+14%).
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment: Ours proposes ProtAlign, a semi-online DPO framework with "elastic preference margins" to fine-tune pre-trained inverse folding models. It optimizes multiple conflicting "developability" attributes (solubility, thermal stability) while maintaining "designability" (sequence-to-structure fidelity). MoMPNN, implemented on ProteinMPNN, outperforms baselines specialized for single attributes across crystal structures, de novo backbones, and real binder design tasks.
ProTDyn: A Foundation Protein Language Model for Thermodynamics and Dynamics Generation: ProTDyn discretizes protein conformations into structural tokens and utilizes a 1.4-billion-parameter autoregressive Transformer to simultaneously learn "thermodynamics" (sampling equilibrium conformational ensembles) and "dynamics" (generating multi-time-scale trajectories) within a single framework. By employing inpainting to refine coarse-grained trajectories into fine-grained ones, it serves as a surrogate for expensive molecular dynamics (MD) simulations and demonstrates generalization to proteins outside the training set.
Protein Structure Tokenization via Geometric Byte Pair Encoding: GeoBPE is proposed as the first framework to extend Byte Pair Encoding (BPE) from discrete text to continuous protein backbone geometry. By alternating between "local merging (k-medoids clustering + quantization)" and "global correction (differentiable inverse kinematics)," it constructs a hierarchical structural motif vocabulary. It surpasses VQ-VAE-based PSTs with >10× compression ratios and >10× data efficiency, ranking first across 24 test sets in 12 downstream tasks.
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding: ProteinAE utilizes a non-equivariant Diffusion Transformer to compress protein backbone coordinates directly in \(E(3)\) space into a continuous and compact latent representation. Trained end-to-end with only a single flow matching loss, its reconstruction accuracy (\(C\alpha\) RMSD) significantly outperforms existing discrete tokenizers. Furthermore, a protein generative model built on this latent space rivals structural domain diffusion models while being nearly 10 times faster.
PSDNorm: Temporal Normalization for Deep Learning in Sleep Staging: This paper proposes PSDNorm—a drop-in normalization layer replacing BatchNorm/InstanceNorm. It aligns the Power Spectral Density (PSD) of each feature map to a moving Riemannian barycenter PSD using Monge mapping within the network. It achieves SOTA on sleep staging across 10 datasets and tens of thousands of subjects, reaching the accuracy of the strongest baseline with only 1/4 of the labeled data.
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding: Aiming at the blind spot where existing interpretability methods only handle self-attention in the "encoder-decoder" architectures commonly used in TCR-pMHC binding prediction models, this paper proposes QCAI. It decomposes asymmetric cross-attention matrices in the decoder into importance scores for both query and key residues and constructs a TCR-XAI benchmark with structural ground truth, achieving SOTA in both interpretability and prediction consistency.
RankFlow: Property-aware Transport for Protein Optimization: Instead of directly attaching a regression head to Protein Language Model (PLM) embeddings to fit fitness values, RankFlow learns an energy-guided conditional flow to transport "property-agnostic" PLM representations into a distribution "aligned with target properties." Combined with a differentiable ranking loss (RC2) and a property-guided spatial gate (PSG), it achieves SOTA ranking accuracy and stronger cross-experimental generalization across the ProteinGym, PEER, and FLIP benchmarks.
Readout Representation: Redefining Neural Codes by Input Recovery: This paper proposes defining neural representations based on "what can be read out from neural features" rather than "what input causally produced the feature." Through perturbed feature inversion experiments in vision and language models, it demonstrates that a single input often corresponds to a broad recoverable region in the feature space, and the representation size serves as a metric for redundancy, robustness, and single-sample representability.
Refine Drugs, Don't Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery: InVirtuoGen utilizes "uniform-source continuous-time discrete flow" on fragmented SMILES to transform the generation paradigm from "step-by-step completion" to "simultaneous refinement of all positions." This approach not only establishes a superior quality-diversity Pareto frontier in de novo generation but also achieves a new SOTA on the PMO benchmark and lead optimization through a hybrid optimization of Genetic Algorithms and PPO.
Representing Local Protein Environments with Machine Learning Force Fields: This paper repurposes intermediate layer embeddings from Machine Learning Force Fields (MLFFs), originally intended for predicting energy and forces, as general-purpose representations of local protein environments. By extracting features of atoms within a 5Å neighborhood centered on a residue from a frozen pre-trained MLFF, the authors demonstrate that biochemical information such as secondary structure, amino acid identity, and protonation states is organized zero-shot. This approach achieves SOTA results on downstream tasks like pKa and NMR chemical shift prediction and enables uncertainty estimation via likelihood calculations.
Reverse Distillation: Consistently Scaling Protein Language Model Representations: Addressing the counter-intuitive scaling phenomenon in Protein Language Models (PLMs) where "larger models do not necessarily perform better," a reverse distillation framework is proposed. By using small model representations as a basis and extracting orthogonal residual information from large models through SVD, the method constructs Matryoshka nested embeddings. This ensures larger reverse-distilled models consistently outperform smaller ones, making ESM-2 15B the strongest in the entire family for the first time after reverse distillation.
Riemannian High-Order Pooling for Brain Foundation Models: To address the issue of EEG foundation models typically relying on a single CLS token and discarding spatio-temporal second-order statistics, this paper proposes RHOP, a plug-and-play Riemannian High-Order Pooling head. Each token is encoded as a scale-invariant quotient Gaussian and embedded into the SPD manifold, then aggregated across tokens using Riemannian Gaussians (Fréchet mean + tangent space covariance). Finally, the sparse inverse covariance is concatenated with the CLS token for classification. RHOP consistently improves performance across 4 EEG benchmarks and 3 training paradigms with only a few thousand parameters.
Riemannian Variational Flow Matching for Material and Protein Design: This paper proposes Riemannian Gaussian Variational Flow Matching (RG-VFM), which extends "endpoint prediction" Variational Flow Matching (VFM) to curved manifolds using the Riemannian Gaussian distribution. Using Jacobi fields, it is proved that RG-VFM naturally incorporates a curvature-related penalty compared to velocity-predicting Riemannian Flow Matching (RFM), providing a stronger supervision signal. RG-VFM consistently outperforms Euclidean and velocity-based baselines across synthetic spherical/hyperbolic data, MOF materials, and protein backbone generation tasks.
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles: RigidSSL represents the protein backbone as a residue-level rigid body sequence. It first learns stable geometric priors under \(SE(3)\) perturbations on static structures from AFDB, and then learns realistic conformational transitions using MD trajectories. This enhances the designability, diversity, and biophysical plausibility of protein backbone generation, motif scaffolding, and GPCR conformational ensemble generation.
SAIR: Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset: SAIR utilizes the Boltz-1x cofolding model to fold 1.049 million protein-ligand complexes curated from ChEMBL/BindingDB, constructing the largest 3D protein-ligand structural dataset to date with experimental activity labels (5.24 million structures). Based on this, a systematic evaluation of various binding affinity prediction methods reveals that existing models lack generalization capabilities on synthetic structures, highlighting an urgent need for targeted fine-tuning.
SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention: SAVE aggregates thousands of genes into several "gene blocks" based on LLM semantic similarity. It performs Transformer attention at the block level, combined with a Variational Autoencoder (VAE) for compression and Latent Flow Matching for generation. By using AdaLN to inject conditions and condition masking to unify generation and transfer tasks, SAVE significantly outperforms existing methods across conditional generation, batch correction, and perturbation prediction, particularly in low-resource and unseen condition configurations.
SC-Arena: A Natural Language Benchmark and Knowledge-Enhanced Evaluation for Single-Cell Reasoning: SC-Arena reformulates the evaluation of "whether an LLM can serve as a virtual cell" into a natural language arena: it uses an object-oriented "Knowledge Cell Class" abstraction to unify evaluation targets (attributes + methods), designs 5 open-ended natural language tasks, and replaces brittle string-matching metrics with a knowledge-enhanced LLM judge linked to ontologies, marker gene databases, and literature. The study finds that current models are fluent in descriptive tasks but fail systematically in mechanistic and causal tasks such as perturbation prediction and cell type annotation.
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics: Ours proposes STAR-MD, an SE(3)-equivariant causal diffusion Transformer that achieves microsecond-scale protein dynamics trajectory generation through joint spatio-temporal attention and contextual noise perturbation. It achieves SOTA across all metrics on the ATLAS benchmark and stably extrapolates to microsecond time scales unseen during training.
scDFM: Distributional Flow Matching for Robust Single-Cell Perturbation Prediction: This paper proposes scDFM, a generative framework based on Conditional Flow Matching (CFM). It ensures distribution-level fidelity through MMD regularization and utilizes a PAD-Transformer backbone to process noisy and sparse single-cell data. On combinatorial perturbation prediction tasks, it achieves a 19.6% reduction in MSE compared to the strongest baseline, CellFlow.
SigmaDock: Untwisting Molecular Docking with Fragment-Based SE(3) Diffusion: By decomposing ligands into "rigid fragments," the generation task is transformed from predicting torsion angles to predicting SE(3) rigid-body transformations for each fragment. Using SE(3) Riemannian diffusion to reassemble these fragments into the binding pocket, SigmaDock achieves a 79.9% Top-1 success rate (RMSD < 2 Å and PB-valid) on PoseBusters, making it the first deep learning docking model to outperform classical physical methods under a fair train-test split.
SimpleFold: Folding Proteins is Simpler Than You Think: SimpleFold treats protein folding as a conditional generation task from "amino acid sequence to all-atom 3D structure." By utilizing only standard Transformer blocks with a flow-matching objective, it completely discards AlphaFold2’s MSA, pair representations, triangle updates, and equivariant modules. Scaled to 3B parameters on 9M distilled structures, it approaches SOTA on standard folding benchmarks and excels particularly in conformational ensemble generation.
SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data: SpectraLLM unifies heterogeneous spectral data (IR, Raman, UV-Vis, NMR, MS) into natural language prompts for a LoRA-finetuned Qwen3, enabling end-to-end autoregressive molecular SMILES generation. It significantly outperforms modality-specific baselines across four public benchmarks, demonstrating that predictive accuracy increases with the number of joint spectral inputs.
Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis: SlotSPE compresses pathology WSI and transcriptomic pathway features into a few patient-adaptive prognostic event slots. It utilizes selective activation, cross-modal reconstruction, and iterative slot interaction for survival risk prediction, achieving an average C-index of 0.721 across 10 TCGA cancer types and maintaining an overall performance of 0.704 even when genomic data is missing.
SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement: SubDyve replaces general molecular fingerprints with "class-discriminative subgraphs" to construct similarity networks. It then utilizes an iterative seed refinement process guided by the local False Discovery Rate (LFDR) to safely expand a small set of known active molecules into a larger seed set. In low-label virtual screening scenarios with only dozens of active labels, SubDyve significantly boosts early enrichment metrics (BEDROC / EF1%) on DUD-E and the 10-million-scale ZINC library.
SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design: This paper benchmarks 8 classic synthesizability metrics across 11 SBDD models, revealing inconsistencies between these metrics. It proposes SYNC, a lightweight SE(3)-invariant synthesizability classifier, and integrates it as a plug-and-play module into the diffusion process (via Guided Diffusion and DPO), significantly improving the synthesizability of generated molecules with minimal loss in binding affinity.
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling: SynCoGen proposes a multimodal generation framework combining masked graph diffusion and flow matching. It simultaneously samples molecule building block reaction graphs and 3D atomic coordinates, ensuring synthetic accessibility while achieving high-quality 3D molecule generation.
Take Note: Your Molecular Dataset Is Probably Aligned: This paper systematically reveals and quantifies a pitfall often overlooked by machine learning newcomers in mainstream molecular datasets like QM9, QMugs, and OMol25: molecules are not randomly oriented. A simple classifier can distinguish original samples from randomly rotated ones with high accuracy, and neural networks can even predict molecular properties by "looking only at the orientation," reminding the community that the performance of non-equivariant models without rotation augmentation is artificially inflated by these spurious signals.
Temporally Detailed Hypergraph Neural ODEs for Disease Progression Modeling: The paper models clinically recognized disease progression pathways as "Temporally Detailed Hypergraphs" (TD-Hypergraphs) with per-marker timestamps. It utilizes a Neural ODE driven by a learnable Hypergraph Laplacian to characterize continuous-time progression dynamics under irregular visit data. On two real-world EHR datasets, it predicts complication markers for the next visit, with the F1 score significantly outperforming baselines such as LSTM, Transformer, Temporal Graph Networks, and Neural ODEs.
Test-Time Adaptation without Source Data for Out-of-Domain Bioactivity Prediction: Aiming at the realistic drug discovery scenario where source training data is inaccessible and only a pre-trained source model is available, this paper proposes TAB—a test-time adaptation framework. It employs uncertainty-weighted consistency learning to force model attention towards genuine binding regions and suppress reliance on shortcut substructures, combined with contrastive learning to prevent representation collapse. Consequently, it consistently outperforms SOTA methods that require source data under three types of distribution shifts: scaffold, protein, and assay.
TetraGT: Tetrahedral Geometry-Driven Explicit Token Interactions with Graph Transformer for Molecular Representation Learning: TetraGT is the first to feed molecular bond and dihedral angles as explicit tokens into a Graph Transformer. It employs "Spatial Tetrahedral Attention" constrained by tetrahedral geometry to allow direct communication between angle tokens. Combined with a Directed Cycle Angle Loss for chirality discrimination and hierarchical virtual nodes, it achieves SOTA on quantum chemistry benchmarks like PCQM4Mv2 and OC20 IS2RE, while leading in downstream transfer tasks such as QM9, PDBBind, Peptides, and LIT-PCBA.
The Human Brain as a Dynamic Mixture of Expert Models in Video Understanding: The authors perform the first "model-brain representational alignment" benchmark of 110 video/image deep models on large-scale dynamic EEG recordings. They propose Cross-Temporal Representational Similarity Analysis (CT-RSA) to match frame-by-frame model features with millisecond-by-millisecond brain responses. The study reveals that neural preferences switch over time during 3-second natural video clips (from static low-level \(\rightarrow\) static high-level objects \(\rightarrow\) mid-level temporal actions). Different brain regions (posterior vs. frontal) and different time points favor different model types; thus, the optimal "alignment model" does not exist in a single network but resembles a "mixture of experts" that switches dynamically.
Thompson Sampling via Fine-Tuning of LLMs: ToSFiT is proposed to directly parameterize the Probability of Maximality (PoM) by fine-tuning Large Language Models (LLMs), extending Thompson Sampling to large-scale unstructured discrete spaces and bypassing the challenges of acquisition function maximization.
Tokenization to Transfer: Do Genomic Foundation Models Learn Good Representations?: The authors systematically benchmarked 7 Genomic Foundation Models (GFMs) against their "randomly initialized weight" counterparts across 52 genomic tasks. They found that random baselines are surprisingly strong, pre-training gains are strictly gated by the tokenizer (gains are negligible for character-level but significant for subword-level), and these models fail to perceive clinically relevant single-nucleotide variants regardless of pre-training. The conclusion is that the current NLP-mimicking pre-training paradigm in genomics brings only "tokenizer-gated marginal improvements."
Towards All-atom Foundation Models for Biomolecular Binding Affinity Prediction: This paper transforms the AlphaFold 3 architecture from "generative structure prediction" into a "representation learner," proposing the All-atom Diffusion Transformer (ADiT). By utilizing unified tokenization to encode both proteins and small molecules, removing the heavy conditional trunk and MSA/template dependencies, and performing denoising pre-training on PDB, a single model achieves or approaches SOTA across four types of affinity tasks: protein-ligand, drug-target, protein-protein, and antibody-antigen, with stable performance gains as the model size increases.
Towards Knowledge-and-Data-Driven Organic Reaction Prediction: RAG-Enhanced and Reasoning-Powered Hybrid System with LLMs: This paper proposes Reaction-Thinker, a hybrid organic reaction prediction system driven by both knowledge and data. It utilizes a classifier and a similarity-based retrieval library to divert samples: those with similar cases follow a RAG path (injecting reaction types and analogous cases into prompts), while those without follow a "CoT Reasoning + GRPO Reinforcement Learning" path. The system achieves an Exact Match of 89.86%, surpassing all compared LLMs and even traditional specialized models (Chemformer 88.13%).
Towards Understanding the Shape of Representations in Protein Language Models: Rather than explaining how Protein Language Models (PLMs) process individual sequences, this work utilizes Square-Root Velocity (SRV) representation from shape analysis and graph filtration tools to characterize "how the entire protein space is deformed by PLMs" as measurable geometric objects. It discovers that representations in ESM2 layers undergo expansion followed by contraction, and that the model most faithfully encodes 3D structures and captures local contexts of approximately 2 and 8 residues near the penultimate layer.
Triangle Multiplication is All You Need for Biomolecular Structure Representations: This paper proposes Pairmixer: an architectural simplification for AlphaFold3/Boltz-1-style co-folding models that removes expensive triangle attention and sequence updates. By retaining only triangle multiplication and FFNs on the pair representation, the model achieves structure prediction accuracy comparable to Pairformer while significantly reducing computational overhead in training, inference, and protein design.
TRIBE: Trimodal Brain Encoder for Whole-Brain fMRI Response Prediction: TRIBE feeds intermediate layer representations from three pre-trained foundation models (text, audio, and video) into a temporal Transformer to predict fMRI responses of 1000 brain parcels end-to-end. By integrating "nonlinear + cross-subject + multimodal" designs, it won the Algonauts 2025 Brain Encoding Competition with a significant lead among 267 teams.
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct: Proposes DiDi-Instruct, a distillation framework based on Integral KL (IKL) divergence minimization, to distill pre-trained discrete Large Language Models (dLLM) into few-step student models. Through four key designs—Adversarial Density Ratio Estimation, Grouped Reward Normalization, Score Decomposition, and Reward-Guided Ancestral Sampler (RGAS)—it surpasses the PPL of a 1024-step teacher model in just 16 steps on OpenWebText, achieving up to 64× inference speedup with a training cost of only 1 GPU hour.
Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion: This paper proposes MIG-Vis: first, a "group-disentangled VAE" encodes macaque IT cortex neural spikes into multiple low-dimensional latent groups; then, "mutual information-guided deterministic DDIM editing" visualizes perturbations of each latent group as image changes, allowing researchers to directly see which neural clusters in the higher visual cortex are responsible for pose, category, or intra-class details.
Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge: PVB (Pretrained Variational Bridge) unifies the training objectives of single-structure pretraining and paired-trajectory finetuning via an encoder-decoder architecture combined with Enhanced Bridge Matching. It achieves cross-domain biomolecular trajectory generation and accelerates protein-ligand holo-state exploration through RL finetuning.
VCWorld: A Biological World Model for Virtual Cell Simulation: VCWorld is proposed as a cell-level white-box simulator that integrates structured biological knowledge graphs with the iterative reasoning capabilities of Large Language Models (LLMs). It simulates signaling cascades triggered by drug perturbations in a data-efficient manner, generating interpretable step-by-step predictions and explicit mechanistic hypotheses, achieving SOTA on drug perturbation benchmarks.
VenusX: Unlocking Fine-Grained Functional Understanding of Proteins: VenusX is the first large-scale benchmark for fine-grained functional understanding within proteins. It organizes residue-level annotations (active sites, binding sites, conserved sites, motifs, domains, and epitopes) into three tasks: residue-level binary classification, segment-level multi-classification, and pairwise functional similarity scoring (totaling 56 datasets and 878k samples). By evaluating mainstream protein models using mixed-family and cross-family splitting protocols, it reveals that "strong global protein-level performance does not guarantee strong fine-grained functional understanding."
Verifier-Constrained Flow Expansion for Discovery Beyond the Data: Proposed Flow Expander (FE), which expands the coverage of pre-trained flow models in probability space via verifier-constrained entropy maximization. It generates design samples that transcend the training data distribution while maintaining validity, increasing diversity in molecular conformation design while preserving chemical validity.
WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport: WFR-FM extends flow matching to "non-mass-conserving" dynamic unbalanced optimal transport. Under the Wasserstein–Fisher–Rao (WFR) geometry, it simultaneously regresses a displacement velocity field and a scalar growth rate function. By constructing conditional paths using analytical Dirac-to-Dirac geodesics, it recovers single-cell dynamics with proliferation/apoptosis without ODE simulation, significantly outperforming existing ODE/FM baselines in accuracy, stability, and efficiency for trajectory inference.