🧬 Computational Biology¶

💬 ACL2025 · 6 paper notes

📌 Same area in other venues: 📷 CVPR2026 (21) · 🔬 ICLR2026 (156) · 💬 ACL2026 (5) · 🧪 ICML2026 (52) · 🤖 AAAI2026 (20) · 🧠 NeurIPS2025 (76)

🔥 Top topics: Biomolecules ×4

Align-Pro: Align Protein Representations Through Multi-Modal Learning: Align-Pro aligns the representations of three modalities of proteins—sequence, structure, and functional description—into a unified embedding space through a multi-modal contrastive learning framework, thereby enabling cross-modal protein retrieval, classification, and function prediction.
Concept Bottleneck Language Models For Protein Design: This paper introduces the explainability design principles of Concept Bottleneck Models (CBMs) into protein language models. By utilizing biological concepts in the intermediate layer as a bottleneck, the proposed method achieves a protein generation system that can design functional protein sequences while simultaneously providing human-understandable design rationales.
A Survey on Foundation Language Models for Single-cell Biology: This is the first systematic survey of foundation language models for single-cell biology from a language modeling perspective. It categorizes existing works into two major groups: PLMs (pre-trained from scratch) and LLMs (leveraging existing large models). The paper comprehensively analyzes tokenization strategies, pre-training/fine-tuning paradigms, and downstream task systems, while highlighting key challenges in data quality, unified evaluation, and scaling laws.
Enhancing Safe and Controllable Protein Generation via Knowledge Preference Optimization: This paper proposes the KPO framework, which constructs a Protein Safety Knowledge Graph (PSKG) combined with a weighted graph pruning strategy to identify "similar but safe" protein pairs, and fine-tunes protein language models using DPO to steer them away from the hazardous sequence space while maintaining functionality.
LADDER: Language Driven Slice Discovery and Error Rectification in Vision Classifiers: LADDER "translates" the internal activations of pre-trained vision classifiers into natural language, retrieves error-related sentences, and leverages LLMs to reason out testable hypotheses regarding "which missing attributes cause the model to fail." This enables the discovery and mitigation of multiple biases in any off-the-shelf classifier without requiring any attribute annotations. It consistently outperforms baselines like Domino, Facts, and DFR across 6 natural/medical datasets and over 200 classifiers.
Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification: Proposes R2E (Retrieve to Explain), a retrieval-based framework that scores and ranks candidate answers by retrieving evidence from a literature corpus and faithfully attributes predictions to supporting evidence using Shapley values, outperforming genetics and GPT-4 baselines in drug target identification tasks.