🩺 Medical NLP¶
🧠 NeurIPS2025 · 6 paper notes
📌 Same area in other venues: 💬 ACL2026 (8) · 🔬 ICLR2026 (5) · 🤖 AAAI2026 (2)
🔥 Top topics: Medical Imaging ×3 · Multimodal/VLM ×2
- CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research
-
This paper introduces CGBench, a clinical genetics benchmark grounded in ClinGen expert annotations, designed to evaluate the scientific literature reasoning capabilities of LLMs from both variant and gene curation perspectives. The benchmark encompasses three tasks—evidence scoring, evidence verification, and experimental evidence extraction—and finds that reasoning models perform best on fine-grained tasks but underperform non-reasoning models on high-level judgments.
- HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring
-
The first benchmark systematically evaluating small language models (SLMs, 1–4B parameters) on mobile and wearable health monitoring tasks, covering zero-shot, few-shot, and instruction fine-tuning paradigms, with on-device deployment validated on an iPhone.
- LLM-Assisted Emergency Triage Benchmark: Bridging Hospital-Rich and MCI-Like Field Simulation
-
This work constructs an open, LLM-assisted emergency triage benchmark based on MIMIC-IV-ED, defining two evaluation scenarios—hospital-rich and mass casualty incident (MCI)-like field simulation—and providing baseline models along with SHAP-based interpretability analysis to promote reproducibility and accessibility in triage prediction research.
- MedMKG: Benchmarking Medical Knowledge Exploitation with Multimodal Knowledge Graph
-
This paper constructs MedMKG, a medical multimodal knowledge graph that integrates MIMIC-CXR imaging data with UMLS clinical concepts, proposes a Neighbor-aware Filtering (NaF) algorithm for image selection, and conducts comprehensive benchmarking of 24 baseline methods across three tasks: link prediction, text-image retrieval, and VQA.
- Mind the Gap: Aligning Knowledge Bases with User Needs to Enhance Mental Health Retrieval
-
This paper proposes a knowledge base augmentation framework grounded in "demand gap" analysis. By overlaying real user data (forum posts) onto existing mental health resource repositories to identify content voids, the framework applies targeted augmentation strategies to achieve near-full-corpus RAG retrieval quality with minimal document additions.
- MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology
-
This paper introduces MTBBench—the first clinical benchmark simultaneously covering three dimensions: multimodality, longitudinal temporal sequencing, and interactive agent workflows. It simulates the decision-making process of Molecular Tumor Boards (MTBs) to evaluate and enhance the multimodal longitudinal reasoning capabilities of AI agents in precision oncology.