🧬 Computational Biology¶
🧪 ICML2026 · 8 paper notes
📌 Same area in other venues: 💬 ACL2026 (2) · 📷 CVPR2026 (5) · 🔬 ICLR2026 (24) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (44) · 📹 ICCV2025 (3)
🔥 Top topics: Biomolecules ×5
- Flow Sampling: Learning to Sample from Unnormalized Densities via Denoising Conditional Processes
-
This paper proposes Flow Sampling, which reverses flow matching/diffusion models from "data-driven" to "noise-driven"—constructing a denoising diffusion drift conditioned on source noise samples. On the interpolant, the detached model samples the energy gradient of \(X_1\) as the regression target, enabling the learning of efficient diffusion samplers in the absence of data, and naturally generalizing to constant curvature Riemannian manifolds.
- From Holo Pockets to Electron Density: GPT-style Drug Design with Density
-
This work replaces the structure-based drug design condition from a "rigid empty pocket" to a "filler low-resolution electron cloud containing ligand and solvent," and proposes the first decoder-only autoregressive EDMolGPT. On DUD-E's 101 targets, it achieves a bioactive recovery of 41%, far surpassing previous ED-based methods.
- Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach
-
L3-PPI transforms the biological "L3 rule" (protein pairs with more length-3 paths are more likely to interact) into a learnable graph prompt: a pretrained GNN recognizes L3 patterns, a gating network generates virtual L3 paths and regularizes their count according to PPI labels, forming a plug-and-play classification head that boosts any PPI representation model by 2–4 points on average.
- Protein Circuit Tracing via Cross-layer Transcoders
-
The authors adapt the cross-layer transcoder from NLP to the protein language model ESM2, proposing the ProtoMech framework, which recovers 79% downstream performance with less than 1% sparse latent circuits, and enables circuit-based steering to design high-fitness protein variants, outperforming baselines in over 70% of cases.
- SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning
-
SIGMA enforces the alignment of hidden states for different SMILES permutations of the same molecule onto a unified trajectory using token-level contrastive loss, and introduces IsoBeam to prune isomorphic redundant paths during decoding, enabling sequence models to "think in chemical space by structure, not by string."
- TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation
-
TD3B frames the design of agonists/antagonists as a "directional transition operator" generation task, using a directional Oracle + affinity gating + tree search amortized fine-tuning within a masked discrete diffusion framework. This enables a pretrained peptide generator to produce peptide sequences that can specifically bias protein conformational transitions toward activation or inactivation.
- Towards A Generative Protein Evolution Machine with DPLM-Evo
-
This work proposes DPLM-Evo, extending the discrete diffusion in protein language models from "mask substitution only" to "explicit modeling of substitution + insertion + deletion evolutionary edits." By decoupling variable-length observed sequences into an upsampled-length latent alignment space plus a context-aware evolutionary noise kernel, it enables variable-length evolutionary generation and trajectory-based protein post-editing/optimization, achieving SOTA on ProteinGym single-sequence variant effect prediction.
- Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models
-
This work identifies that single-cell foundation models (scFM) contain rich gene regulatory knowledge that is obscured by "reconstruction-based pretraining." It introduces two probes—Virtual Value Perturbation and Gradient Trajectory—to distill pairwise gene features from frozen scFM that generalize across genes and datasets. On the BEELINE benchmark, AUPRC is improved from ~0.5 to 0.8–0.97, inaugurating a new paradigm of "Universal GRN Inference (UGRN)."