🏥 Medical Imaging¶
🧪 ICML2026 · 17 paper notes
📌 Same area in other venues: 💬 ACL2026 (37) · 📷 CVPR2026 (118) · 🔬 ICLR2026 (79) · 🤖 AAAI2026 (106) · 🧠 NeurIPS2025 (144) · 📹 ICCV2025 (38)
🔥 Top topics: Biomolecules ×5 · Multimodal/VLM ×3 · Alignment/RLHF ×3 · Medical Imaging ×3 · Reasoning ×2
- Auditing Sybil: Explaining Deep Lung Cancer Risk Prediction Through Generative Interventional Attributions
-
This paper introduces S(H)NAP—a generative interventional framework based on 3D diffusion bridges for "removal + insertion"—which reverses the decision process of Sybil, a state-of-the-art lung cancer risk prediction model, into an LMPI (linear + second-order interaction model) comprising "nodule main effects + pairwise interactions + background." For the first time, it causally (rather than correlationally) audits Sybil’s reliance on in-hospital artifacts such as ECG electrodes and clothing metal fasteners, and reveals a severe "radial insensitivity" failure mode for peripheral lung nodules.
- EEG-Based Multimodal Learning via Hyperbolic Mixture-of-Curvature Experts
-
EEG-MoCE assigns each modality in EEG-based multimodal learning (emotion/sleep/cognition) a Lorentz manifold expert with learnable curvature, then uses curvature-aware attention for cross-modal fusion, where "higher curvature → richer hierarchy → higher fusion weight". On EAV/ISRUC/Cognitive datasets, cross-subject accuracy improves by +14.14%, +3.34%, and +7.98%, respectively.
- Evidential Reasoning Advances Interpretable Real-World Disease Screening
-
EviScreen employs a "normal + pathological" dual knowledge bank for region-level evidence retrieval, then performs evidential reasoning between the current case and retrieved evidence via cross-attention and self-attention. This provides both retrospective interpretability (which historical cases support the current decision) and localization interpretability (abnormality maps from contrastive retrieval). On four real-world external test sets, it achieves SOTA specificity at high recall.
- Federated Distillation for Whole Slide Image via Gaussian-Mixture Feature Alignment and Curriculum Integration
-
This paper proposes FedHD: in heterogeneous federated pathology scenarios, it performs "one-to-one" WSI feature-level distillation via Gaussian-mixture feature alignment, then gradually injects cross-institution synthetic features into local training through curriculum learning. This enables institutions to collaborate without sharing raw data or exchanging model parameters, and is compatible with heterogeneous MIL architectures and feature extractors. FedHD comprehensively outperforms existing federated and distillation baselines on TCGA-IDH / CAMELYON16 / CAMELYON17.
- From Holo Pockets to Electron Density: GPT-style Drug Design with Density
-
This work replaces the structure-based drug design condition from a "rigid empty pocket" to a "filler low-resolution electron cloud containing ligand and solvent," and proposes the first decoder-only autoregressive EDMolGPT. On DUD-E's 101 targets, it achieves a bioactive recovery of 41%, far surpassing previous ED-based methods.
- OT-Bridge Editor: Geometrically Constrained Stenosis Editing in Coronary Angiography via Entropic Optimal Transport
-
OT-Bridge Editor reformulates "editing a vessel stenosis in coronary angiography" as a "constrained entropic OT problem in the vessel-structure composite domain," leveraging Schrödinger Bridge with path-level geometric projection supervision to achieve pixel-level controllable synthetic angiograms. On the ARCADE public dataset, it achieves a 27.8% relative improvement in downstream stenosis detection mAP@0.5.
- Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language Models
-
NAST uses causal tracing to compute the causal contribution (CTE) of each layer in the CLIP text encoder for negation understanding, then applies these CTEs for layer-wise gradient scaling in LoRA fine-tuning. This significantly enhances the semantic sensitivity of medical VLMs in distinguishing "presence/absence of symptoms," reducing the affirmative-negation accuracy gap from 21.6% to 4.2%.
- Learning the Interaction Prior for Protein-Protein Interaction Prediction: A Model-Agnostic Approach
-
L3-PPI transforms the biological "L3 rule" (protein pairs with more length-3 paths are more likely to interact) into a learnable graph prompt: a pretrained GNN recognizes L3 patterns, a gating network generates virtual L3 paths and regularizes their count according to PPI labels, forming a plug-and-play classification head that boosts any PPI representation model by 2–4 points on average.
- Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning
-
This paper proposes DiffDT: a conditional Latent Diffusion framework that connects electronic health records (ICD-coded event sequences) with multi-organ biomarker digital twins (tabular features derived from brain/heart/liver/kidney imaging and brain functional connectivity SPD matrices). The key innovation is an SPD-VQVAE based on Cholesky decomposition, which reduces \(\mathcal{O}(N^3)\) SPD manifold diffusion to a manifold-preserving and efficient latent space. An AR model then performs multi-pathway disease reasoning via the mediation path “generate digital twin → predict next ICD.” On UKB, next-event prediction AUC for 1944 diseases reaches 0.91, setting a new SOTA.
- MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training
-
MEG-XL performs masked token pre-training on 2.5 minutes (191k tokens) of MEG context (5–300× longer than prior work), then fine-tunes on a 50-word brain-to-text task. With only 1 hour of data, it matches the decoding accuracy of SOTA supervised methods trained on 50 hours, and significantly outperforms all brain foundation models.
- Protein Circuit Tracing via Cross-layer Transcoders
-
The authors adapt the cross-layer transcoder from NLP to the protein language model ESM2, proposing the ProtoMech framework, which recovers 79% downstream performance with less than 1% sparse latent circuits, and enables circuit-based steering to design high-fitness protein variants, outperforming baselines in over 70% of cases.
- Scaling Vision Transformers for Functional MRI with Flat Maps
-
By projecting 3D fMRI volumes into 2D "cortical flat maps" and feeding them as videos to a standard spacetime MAE-ViT, the authors train CortexMAE on 2.1K hours of HCP data: it dramatically outperforms SOTA in cognitive state decoding, validating flat maps as the "goldilocks zone" between voxel (volume) and region-averaged (parcellation) representations. They also release the first open-source fMRI foundation model benchmark Brainmarks, provide the first systematic scaling law for fMRI models, and report an honest null result: individual trait prediction still cannot beat a simple functional connectivity baseline.
- SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning
-
SIGMA enforces the alignment of hidden states for different SMILES permutations of the same molecule onto a unified trajectory using token-level contrastive loss, and introduces IsoBeam to prune isomorphic redundant paths during decoding, enabling sequence models to "think in chemical space by structure, not by string."
- SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment
-
SynerMedGen proposes the "generation-aligned understanding" principle—deriving understanding tasks directly from the same paired synthetic data (CTS / MI / TIA tasks). It first uses two-stage training to enable the understanding branch to learn representations beneficial for synthesis, then transfers these to the latent flow matching generation branch. On 22 medical synthesis tasks, it outperforms both dedicated synthesis models and existing unified MLLMs.
- TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation
-
TD3B frames the design of agonists/antagonists as a "directional transition operator" generation task, using a directional Oracle + affinity gating + tree search amortized fine-tuning within a masked discrete diffusion framework. This enables a pretrained peptide generator to produce peptide sequences that can specifically bias protein conformational transitions toward activation or inactivation.
- Towards A Generative Protein Evolution Machine with DPLM-Evo
-
This work proposes DPLM-Evo, extending the discrete diffusion in protein language models from "mask substitution only" to "explicit modeling of substitution + insertion + deletion evolutionary edits." By decoupling variable-length observed sequences into an upsampled-length latent alignment space plus a context-aware evolutionary noise kernel, it enables variable-length evolutionary generation and trajectory-based protein post-editing/optimization, achieving SOTA on ProteinGym single-sequence variant effect prediction.
- Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models
-
This work identifies that single-cell foundation models (scFM) contain rich gene regulatory knowledge that is obscured by "reconstruction-based pretraining." It introduces two probes—Virtual Value Perturbation and Gradient Trajectory—to distill pairwise gene features from frozen scFM that generalize across genes and datasets. On the BEELINE benchmark, AUPRC is improved from ~0.5 to 0.8–0.97, inaugurating a new paradigm of "Universal GRN Inference (UGRN)."