ICML2025 Physics & Scientific Computing AI paper notes paper summaries Few-/Zero-Shot Learning LLM Reasoning Diffusion Models

⚛️ Physics & Scientific Computing¶

🧪 ICML2025 · 20 paper notes

📌 Same area in other venues: 📷 CVPR2026 (2) · 🔬 ICLR2026 (69) · 🧪 ICML2026 (33) · 🤖 AAAI2026 (15) · 🧠 NeurIPS2025 (57) · 📹 ICCV2025 (2)

🔥 Top topics: Few-/Zero-Shot Learning ×2 · LLM ×2

Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed Kernel: Proposes Causal-PIK, which encodes physical causal similarity into a Physics-Informed Kernel for Bayesian optimization. This enables agents to find optimal actions with very few attempts in physical reasoning tasks, outperforming SOTA on the Virtual Tools and PHYRE benchmarks.
Causal Discovery of Latent Variables in Galactic Archaeology: Utilizing the Rank-based Latent Causal Discovery (RLCD) algorithm, this study automatically recovers two physically meaningful latent variables—birth radius and guiding radius—from only five observable stellar properties in a purely data-driven manner. This validates the potential of causal discovery methods to identify hidden physical quantities in astrophysics.
Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations: This paper proposes the SymPDE framework, which utilizes deep reinforcement learning to directly search for closed-form symbolic solutions to PDEs, bypassing the issues of insufficient numerical accuracy and poor interpretability of PINNs. It achieves a 90% recovery rate on Poisson and heat equations.
Compact Matrix Quantum Group Equivariant Neural Networks: This paper extends group equivariant neural networks to the setting of compact matrix quantum groups, characterizing the weight matrices of such networks using Woronowicz's formulation of Tannaka-Krein duality, thereby providing a theoretical foundation for learning data on noncommutative geometries.
Differentiable Stellar Atmospheres with Physics-Informed Neural Networks: This work proposes Kurucz-a1, a physics-informed neural network (PINN) designed to simulate 1D stellar atmosphere models under the Local Thermodynamic Equilibrium (LTE) assumption. It resolves the key bottleneck of non-differentiable atmospheric structure solvers in differentiable stellar spectroscopy, outperforming the classic ATLAS-12 code in hydrostatic equilibrium and solar spectrum consistency.
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems: The authors propose Erwin, a Transformer architecture based on a hierarchical ball tree structure. By restricting attention computation within fixed-size local spherical regions, Erwin achieves linear time complexity. Meanwhile, it captures multi-scale features through progressive coarsening/refinement and cross-ball interaction mechanisms, achieving SOTA performance in multiple domains including cosmology, molecular dynamics, PDE solving, and particle fluid dynamics.
Finetuning Stellar Spectra Foundation Models with LoRA: This work applies LoRA to the stellar spectra foundation model SpecCLIP for the first time, achieving efficient adaptation of models pre-trained on LAMOST/Gaia XP to DESI survey data with only approximately 100-200 labeled samples, demonstrating that LoRA is a lightweight and effective strategy for cross-survey spectral migration.
Gravity-Bench-v1: A Benchmark on Gravitational Physics Discovery for Agents: Proposes Gravity-Bench-v1, an interactive environment benchmark based on gravitational dynamics simulation to evaluate the capability of AI agents to make scientific discoveries (including OOD physics scenarios) under restricted observation budgets. The results show that current models possess significant shortcomings in observational planning and budget utilization.
Improving Memory Efficiency for Training KANs via Meta Learning: Proposes MetaKANs, which use a small meta-learner to generate the parameters of all learnable activation functions in KANs. This compresses the trainable parameter count from \((G+k+1)\) times that of KANs to a level close to MLPs (approximately 1/3 to 1/9), while maintaining or even improving performance.
L2D: Large Language Models to Diffusion Finetuning: This paper proposes the L2D finetuning method, which treats a seed pretrained LLM as a single-step diffusion model and introduces a parallel diffusion path to achieve multi-step inference scaling. Without modifying the original weights, it obtains monotonically increasing accuracy as the number of inference steps increases, achieving consistent improvements across mathematical, coding, and reasoning tasks on four LLMs.
Liger: Linearizing Large Language Models to Gated Recurrent Structures: Liger converts pretrained Transformer LLMs into gated linear recurrent structures without adding extra parameters by reusing the Key projection matrix to construct the gating mechanism. It recovers up to 93% of the original model's performance using only 0.02% of the pretraining tokens, while achieving linear-time inference and constant memory overhead.
Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators: This work derives the Maximal Update Parametrization (μP) for the Fourier Neural Operator (FNO) for the first time, enabling zero-shot transfer of hyperparameters tuned on small models to billion-parameter FNOs, which reduces the tuning computational cost for Navier-Stokes problems to 0.30×.
Mixture-of-Expert Variational Autoencoders for Cross-Modality Embedding of Type Ia Supernova Data: A multimodal Mixture-of-Experts VAE (MMVAE) based on the Perceiver-IO architecture is proposed to jointly embed light curves and spectra of Type Ia supernovae, enabling cross-modality probabilistic generation from light curves to spectra with reconstruction precision outperforming contrastive learning baselines.
OmniArch: Building Foundation Model For Scientific Computing: OmniArch is the first scientific computing foundation model uniformly pre-trained on 1D-2D-3D PDEs. It addresses multi-scale challenges via a Fourier encoder-decoder, handles multi-physical quantity couplings using a Temporal Mask mechanism, and aligns physical priors with a PDE-Aligner, achieving SOTA performance on 11 families of PDEs in PDEBench.
PAC Learning with Improvements: This paper proposes the "PAC Learning with Improvements" framework: when agents can genuinely improve their features by at most \(r\), conservative classifiers can achieve zero error (rendering a previously impossible goal in standard PAC learning possible). A finite VC dimension is shown to be neither necessary nor sufficient, revealing a fundamental separation of learning with improvements from both standard PAC learning and strategic classification.
Rethink the Role of Deep Learning towards Large-scale Quantum Systems: This paper systematically compares the performance of ML and DL in Quantum System Learning (QSL) tasks under a unified quantum resource constraint. It finds that traditional ML (Lasso/Ridge/kernel methods) often matches or even outperforms DL, challenging the intuition that "large-scale quantum systems must utilize deep learning."
Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups: SOP proposes a framework to transform any differentiable model into a group-based Self-Attributing Neural Network (SANN). It achieves state-of-the-art performance in SANNs through end-to-end learning of feature groups, and theoretically proves the error lower bound of feature-wise SANNs and the zero-error reachability of group-based SANNs.
Teaching LLMs to Speak Spectroscopy: Using only 16 GPU hours and 0.04% parameter adaptation via LoRA, LLaMA-3.1-8B is adapted to predict planetary/galaxy redshifts from spectral data while retaining over 85% of its natural language capabilities, demonstrating that general LLMs can be efficiently adapted to non-textual scientific modalities.
The Dark Side of the Forces: Assessing Non-Conservative Force Models for Atomistic Machine Learning: This work systematically assesses the catastrophic consequences of non-conservative machine learning interatomic potentials (which directly predict forces instead of deriving them from potential energy) in geometry optimization and molecular dynamics, and proposes a hybrid conservative/non-conservative model that balances efficiency and physical correctness using a multiple-timestep (MTS) scheme.
Universal Neural Optimal Transport: This work proposes Universal Neural Optimal Transport (UNOT), which utilizes Fourier Neural Operators to learn entropy-regularized optimal transport dual potentials across datasets and resolutions, achieving up to a 7.4× initialization speedup for the Sinkhorn algorithm.