📂 Others¶

🧠 NeurIPS2025 · 150 paper notes

3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization: This paper proposes the 3DID framework, which learns a unified physics-geometry triplane latent representation, performs objective-gradient-guided diffusion sampling, and applies a two-stage topology-preserving refinement strategy to conduct inverse design directly in the full 3D space starting from random noise. On vehicle aerodynamic shape optimization, 3DID reduces simulated drag (Sim-Drag) by 13.6% compared to the best baseline.
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos: This paper proposes 4DGT — a 4D Gaussian-based Transformer model trained entirely on real-world monocular posed videos that reconstructs dynamic scenes in seconds via feed-forward inference, significantly outperforming comparable feed-forward networks while achieving accuracy on par with optimization-based methods.
A Differentiable Model of Supply-Chain Shocks: A JAX-based differentiable Agent-Based Model (ABM) of supply chains (~1,000 firms) that combines GPU parallelization and automatic differentiation to achieve Bayesian parameter calibration three orders of magnitude faster than conventional ABC, paving the way for shock-propagation modeling in global supply-chain networks.
A Generalized Label Shift Perspective for Cross-Domain Gaze Estimation: This paper formulates cross-domain gaze estimation (CDGE) as a generalized label shift (GLS) problem, demonstrating that existing domain-invariant representation learning methods are theoretically insufficient under label shift. It proposes continuous importance reweighting based on truncated Gaussian distributions and a Probability-aware Conditional Operator Discrepancy (PCOD) to jointly correct label shift and conditional shift, achieving an average error reduction of 12%–27% across multiple backbones.
A Sustainable AI Economy Needs Data Deals That Work for Generators: This paper introduces the concept of the "Economic Data Processing Inequality" — in the ML value chain, data progresses from raw form to model weights to synthetic outputs, with each step refining technical signals while systematically stripping economic rights from data generators. The authors empirically validate this phenomenon through analysis of 73 publicly available data transactions, diagnose three structural deficiencies (missing provenance, asymmetric bargaining power, non-dynamic pricing), and propose the EDVEX framework as a solution blueprint.
A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation: This paper rigorously proves the mechanism behind grokking from a purely optimization-theoretic perspective. Gradient flow with small weight decay exhibits two-phase dynamics in the \(\lambda\to 0\) limit: rapid convergence to the critical manifold \(\mathcal{M}\) of the training loss, followed by a Riemannian gradient flow along the manifold minimizing the \(\ell_2\) norm at timescale \(t\approx 1/\lambda\), thereby inducing delayed generalization.
A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random: Within a Gaussian mixture model clustering framework, this paper jointly addresses variable selection (distinguishing signal, redundant, and noise variables) and MNAR missing data modeling. A two-stage strategy—LASSO-penalized ranking followed by BIC-based role assignment—combined with spectral-distance adaptive penalty weights enables efficient inference in high-dimensional settings. Identifiability and asymptotic selection consistency are established theoretically.
Active Measurement: Efficient Estimation at Scale: This paper proposes the Active Measurement framework, which uses AI model predictions as an importance sampling proposal distribution and achieves unbiased estimation of scientific aggregate quantities through iterative human annotation and model updates, complemented by a novel combination weighting scheme and a conditional variance estimator for constructing reliable confidence intervals.
AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking: AcuRank maintains a probability distribution over document relevance via a Bayesian TrueSkill model, and at each iteration selectively reranks only documents whose positions remain uncertain. This yields a reranking framework that adaptively allocates computation according to query difficulty, surpassing fixed-computation baselines on multiple benchmarks with fewer LLM calls.
Adaptive Data Analysis for Growing Data: This paper establishes the first generalization bounds for adaptive analysis over dynamically growing data, permitting analysts to schedule queries adaptively based on current dataset size, and achieving increasingly tight guarantees as data accumulates via time-varying empirical accuracy bounds and differential privacy mechanisms.
Addressing Mark Imbalance in Integration-free Neural Marked Temporal Point Processes: This paper is the first to systematically reveal the severe impact of mark distribution imbalance on prediction performance in marked temporal point processes (MTPP). It proposes a mark-first-then-time prediction strategy, designs a thresholding method to calibrate the predicted probabilities of rare marks, and develops the integration-free IFNMTPP model to efficiently support mark probability estimation and time sampling.
Adjoint Schrödinger Bridge Sampler: This paper proposes the Adjoint Schrödinger Bridge Sampler (ASBS), which reinterprets the Schrödinger Bridge problem as a stochastic optimal control (SOC) problem. This eliminates the memoryless condition required by prior diffusion samplers, supports arbitrary source distributions (e.g., Gaussian, harmonic priors), and employs a scalable matching objective without importance weight estimation. ASBS consistently outperforms prior methods on multi-particle energy functions and molecular conformation generation.
Adjusted Count Quantification Learning on Graphs: This paper extends the classical Adjusted Classify & Count (ACC) quantification method to graph-structured data, proposing two techniques — Structural Importance Sampling (SIS) and Neighborhood-aware ACC (N-ACC) — to address structural covariate shift and non-homophilous edges in graph quantification, respectively.
ADPretrain: Advancing Industrial Anomaly Detection via Anomaly Representation Pretraining: This work presents ADPretrain, the first dedicated representation pretraining framework for industrial anomaly detection. By learning residual feature representations via angle-oriented and norm-oriented contrastive losses on the large-scale RealIAD dataset, the pretrained features consistently improve five mainstream embedding-based AD methods across five datasets and five backbone networks when substituted for the original features.
Alias-Free ViT: Fractional Shift Invariance via Linear Attention: This paper proposes the Alias-Free Vision Transformer (AFT), which combines anti-aliasing signal processing techniques with shift-equivariant linear cross-covariance attention, achieving near-perfect consistency (~99%) under fractional (sub-pixel) shifts for the first time, with negligible degradation in ImageNet classification accuracy.
An Empirical Investigation of Neural ODEs and Symbolic Regression for Dynamical Systems: This paper systematically investigates the extrapolation capability of Neural ODEs (NODEs) on noisy synthetic data, and explores a pipeline that employs NODEs as a data augmentation tool combined with symbolic regression (SR) to recover governing equations from limited data. Results demonstrate that this combined approach can recover two of three governing equations—and a strong approximation of the third—using only 10% of the simulation data.
EPHAD: An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination: EPHAD proposes a test-time post-processing framework that corrects the output of anomaly detection models trained on contaminated data via Bayesian-style fusion with external evidence (e.g., CLIP, LOF) through exponential tilting. The framework requires no access to the training pipeline and consistently improves detection performance of contaminated models across 8 visual and 26 tabular AD datasets.
Are Pixel-Wise Metrics Reliable for Sparse-View Computed Tomography Reconstruction?: This paper reveals that pixel-level metrics such as PSNR and SSIM fail to capture anatomical structural completeness in sparse-view CT reconstruction (correlation only 0.16–0.30), and proposes anatomy-aware metrics (NSD/clDice) based on automated segmentation alongside the CARE framework—which incorporates segmentation-guided loss into diffusion model training—achieving 32% improvement in structural completeness for large organs and 36% for vessels.
AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing: This work proposes the AutoSciDACT pipeline, which first employs supervised contrastive learning to compress high-dimensional scientific data into a 4-dimensional embedding space, then applies NPLM (New Physics Learning Machine) likelihood-ratio testing to statistically quantify distributional deviations in the embedding space. The pipeline achieves \(\geq 3\sigma\) discovery at signal injection ratios of \(\leq 1\%\) across astronomical, particle physics, pathology, image, and synthetic datasets.
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts: Heterogeneous experts in Mixture-of-Experts models do not spontaneously form processing pathways. This paper proposes three brain-inspired inductive biases — routing cost, task-performance scaling, and expert dropout — that enable the model to develop a Mixture-of-Pathways architecture analogous to the brain's dynamic cortical–subcortical pathways.
Computable Universal Online Learning: This paper introduces computability constraints into the universal online learning framework, proving that "mathematically learnable" does not imply "learnable by a computer program," and provides precise characterizations of computable learning under both agnostic and proper variants.
ConTextTab: A Semantics-Aware Tabular In-Context Learner: ConTextTab integrates semantic embeddings (text encodings of column names and categorical values) into a table-native ICL architecture, and pretrains on large-scale real-world tabular data (T4, ~2.18M tables). It achieves a new SOTA on the semantics-rich CARTE benchmark while remaining competitive with existing methods on non-semantic benchmarks.
Contextual Dynamic Pricing with Heterogeneous Buyers: This paper presents the first systematic study of contextual dynamic pricing with heterogeneous buyers of \(K_\star\) unknown types. It proposes an Optimistic Posterior Sampling (OPS)-based algorithm achieving an \(\tilde{O}(K_\star\sqrt{dT})\) regret bound (optimal in \(d\) and \(T\)), and further introduces ZoomV—a variance-aware adaptive discretization algorithm—achieving the optimal \(\tilde{O}(\sqrt{K_\star T})\) regret in the non-contextual setting.
Continuous Thought Machines: This paper proposes the Continuous Thought Machine (CTM), which generates neuron-level temporal dynamics via privately parameterized Neuron-Level Models (NLMs) and employs a neural synchrony matrix as the core latent representation. The model demonstrates complex reasoning, adaptive computation, and interpretable attention behavior on tasks including maze solving, ImageNet classification, and parity checking.
Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers: This paper is the first to eliminate the dependency of the robust geometric median coreset size on the number of outliers \(m\): under the condition \(n \geq 4m\), it achieves an optimal coreset size of \(\tilde{\Theta}(\varepsilon^{-1/2} + \frac{m}{n}\varepsilon^{-1})\) for \(d=1\), and \(\tilde{O}(\varepsilon^{-2}\min\{\varepsilon^{-2}, d\})\) in high dimensions. The core technical contribution is a novel non-componentwise error analysis.
Coresets for Clustering Under Stochastic Noise: This paper presents the first systematic study of \((k,z)\)-clustering coreset construction under noisy data. It proposes a novel surrogate error metric \(\mathsf{Err}_\alpha\) to replace the traditional \(\mathsf{Err}\), achieving a \(\text{poly}(k)\)-fold reduction in coreset size and a \(\text{poly}(k)\)-fold tightening of quality guarantees under mild data assumptions, along with a noise-aware cluster-wise sampling algorithm.
Deep Continuous-Time State-Space Models for Marked Event Sequences: S2P2 unifies linear Hawkes processes with deep state space models by stacking multiple implicit Linear Hawkes (LLH) layers with nonlinear activations, yielding a highly expressive continuous-time MTPP model. It leverages parallel scanning to achieve linear complexity and sub-linear runtime, improving predictive likelihood by an average of 33% across 8 real-world datasets.
Deep Learning for Continuous-Time Stochastic Control with Jumps: Two model-based deep learning algorithms (GPI-PINN and GPI-CBU) are proposed to solve finite-horizon continuous-time stochastic control problems with jumps. By iteratively training a policy network and a value network, the approach avoids discretization and simulation of state dynamics, and demonstrates strong performance in high-dimensional settings.
Deep Legendre Transform: DLT exploits the implicit Fenchel representation of convex conjugates, \(f^*(\nabla f(x)) = \langle x, \nabla f(x) \rangle - f(x)\), to reformulate conjugate computation as a standard regression problem, thereby avoiding max/min-max optimization. The method also admits a posteriori error estimation, and when combined with KAN, yields exact closed-form solutions.
Dense Associative Memory with Epanechnikov Energy: This paper proposes a log-sum-ReLU (LSR) energy function based on the Epanechnikov kernel as a replacement for the conventional log-sum-exp (LSE) energy in Dense Associative Memory. For the first time, it achieves the coexistence of exact retrieval of all stored patterns and the emergence of novel creative local minima, while preserving exponential memory capacity.
Depth-Bounds for Neural Networks via the Braid Arrangement: This paper proves that, under \(\mathcal{B}_d^0\)-conforming constraints, exactly representing \(\max\{0, x_1, \ldots, x_d\}\) with a ReLU network requires \(\Omega(\log \log d)\) layers—the first non-constant depth lower bound without weight restrictions. It also shows that rank-(3,2) maxout networks can compute the maximum of 7 values, demonstrating that the standard upper bound is not tight.
Depth-Supervised Fusion Network for Seamless-Free Image Stitching: DSFN proposes a seamless image stitching method with depth consistency constraints: a depth-aware two-stage transformation estimation addresses large-parallax alignment, soft-seam region diffusion enables natural blending, and a re-parameterization strategy improves efficiency. The method comprehensively surpasses the state of the art on the UDIS-D and IVSD datasets.
Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning: This paper proposes an algebraic framework based on directional non-commutative monoid operators, providing a unified mathematical foundation for multi-dimensional compositional embeddings and unifying SSM recurrence, Transformer self-attention, and RoPE positional encoding as special cases.
Distributionally Robust Feature Selection: This paper proposes a model-agnostic distributionally robust feature selection method that achieves a continuous relaxation of discrete selection by injecting controlled Gaussian noise into covariates, and optimizes the conditional variance of the Bayes-optimal predictor, so that the selected feature subset enables high-quality downstream models to be trained simultaneously across multiple subpopulations.
Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis: This paper is the first to reveal a double descent phenomenon in post-hoc OOD detection—OOD detection performance exhibits a valley near the interpolation threshold as model width increases, then recovers—provides a theoretical explanation via random matrix theory, and proposes an NC1 criterion based on Neural Collapse to identify the optimal model complexity regime.
DPA: A One-Stop Metric to Measure Bias Amplification in Classification Datasets: This paper proposes Directional Predictability Amplification (DPA), a predictability-based metric for measuring bias amplification. It is the only one-stop metric that simultaneously satisfies directionality, applicability to both balanced and imbalanced datasets, and correct identification of positive and negative bias amplification, by measuring the relative change between model bias and dataset bias.
Efficient Kernelized Learning in Polyhedral Games Beyond Full-Information: From Colonel Blotto to Congestion Games: This paper proposes a kernelization-based framework for designing computationally efficient no-regret learning algorithms for polyhedral games (Colonel Blotto, graphic matroid congestion games, and network congestion games) under partial-information feedback, significantly improving the runtime complexity for learning coarse correlated equilibria (CCE).
Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems: This paper proposes a low-rank approximation (LoRA)-based objective to learn the top-k singular functions of the Koopman operator for stochastic dynamical systems, entirely avoiding the numerically unstable matrix decomposition operations present in VAMPnet/DPNet, with naturally unbiased gradients.
Emergency Response Measures for Catastrophic AI Risk: This paper systematically analyzes how Frontier Safety Policies (FSPs) can be integrated into the first two stages of China's four-phase emergency response framework (prevention–early warning–response–recovery), employing dangerous capability evaluations, tiered thresholds, and pre-established safety measures to address catastrophic AI risks. The analysis is further contextualized through comparisons with international practices such as the EU AI Act and California SB 53.
Evolutionary Learning in Spatial Agent-Based Models for Physical Climate Risk Assessment: This paper proposes an Agent-Based Model (ABM) that integrates geospatial climate hazard data with evolutionary learning mechanisms. Using a simplified economic network comprising a three-tier commodity–manufacturing–retail supply chain, the model simulates economic responses from 2025 to 2100 under RCP8.5 flood projections. Results demonstrate that evolutionary adaptation enables firms to maintain significantly higher levels of production, capital, liquidity, and employment under climate stress, while revealing supply chain systemic risks that traditional asset-level assessments fail to capture.
Evolutionary Prediction Games: This paper proposes the "Evolutionary Prediction Games" framework, applying evolutionary game theory to analyze feedback loops between prediction algorithms and user populations. It shows that ideal learners lead to competitive exclusion (survival of the fittest), whereas practical learners (with finite data, surrogate losses, or overparameterization) can instead foster stable coexistence and mutualism among groups.
Exact Learning of Arithmetic with Differentiable Agents: This paper proposes the Differentiable Finite-State Transducer (DFST), a Turing-complete and end-to-end differentiable model family that operates on a 2D symbol grid. Trained via Policy-Trajectory Observations (PTOs) derived from expert arithmetic computations, DFST achieves perfect generalization to 3,850-digit binary addition and 2,450-digit decimal addition using only 20 training samples of up to 3-digit addition, with zero observed errors.
FACE: Faithful Automatic Concept Extraction: This paper proposes FACE, a framework that incorporates a KL divergence regularization term into non-negative matrix factorization (NMF) to constrain reconstructed activations to remain consistent with the original model's predictions, thereby extracting concept explanations that are truly faithful to the model's decision process. FACE comprehensively outperforms CRAFT and ICE on ImageNet, COCO, and CelebA.
Faithful Group Shapley Value: This paper proposes the Faithful Group Shapley Value (FGSV), the unique group-level data valuation method satisfying five axioms including "faithfulness," which effectively defends against the "shell company attack" (artificially inflating valuation by splitting subgroups), and introduces an efficient approximation algorithm with \(O(n \cdot \text{Poly}(\log n))\) complexity.
Finite-Time Analysis of Stochastic Nonconvex Nonsmooth Optimization on the Riemannian Manifolds: This paper proposes the Riemannian Online to NonConvex (RO2NC) algorithm and its zeroth-order variant ZO-RO2NC, establishing for the first time a finite-time sample complexity guarantee of \(O(\delta^{-1}\epsilon^{-3})\) for fully nonsmooth nonconvex stochastic optimization on Riemannian manifolds, matching the optimal result in Euclidean space.
FlashMD: Long-Stride, Universal Prediction of Molecular Dynamics: FlashMD is proposed as a GNN-based framework that directly predicts the positional and momentum evolution of molecular dynamics trajectories with long strides, achieving time steps 1–2 orders of magnitude larger than those of conventional MD integrators. The architecture incorporates Hamiltonian dynamics constraints and generalizes to arbitrary thermodynamic ensembles and universal chemical systems.
FlowMoE: A Scalable Pipeline Scheduling Framework for Distributed MoE Training: FlowMoE proposes a unified pipeline scheduling framework that integrates MHA computation, gating, expert computation, and A2A communication into a single pipeline. A priority-driven all-reduce tensor chunking mechanism maximizes communication–computation overlap, achieving 1.13×–1.82× speedup, 10–39% energy reduction, and 7–32% memory savings across multiple real-world MoE models.
Fostering the Ecosystem of AI for Social Impact Requires Expanding and Strengthening Evaluation Standards: This paper argues that the academic ecosystem of AI for Social Impact (AISI) requires a dual-track reform: broadening the definition of "impact" to recognize contributions beyond deployment or methodological novelty, while simultaneously demanding causal-inference-level rigor in evaluating deployed systems.
Frequency-Aware Token Reduction for Efficient Vision Transformer: This paper proposes frequency-aware token reduction from a frequency-domain perspective, partitioning tokens into high-frequency (HF) and low-frequency (LF) groups. HF tokens are selectively retained while LF tokens are aggregated into DC tokens, simultaneously alleviating rank collapse and reducing computational cost in ViTs. The method outperforms existing SOTA across multiple models at a 30% token reduction ratio.
FSNet: Feasibility-Seeking Neural Network for Constrained Optimization with Guarantees: This paper proposes FSNet, a framework that integrates differentiable feasibility-seeking steps into neural networks. By minimizing constraint violations via unconstrained optimization, FSNet guarantees constraint satisfaction while supporting end-to-end training. It significantly outperforms traditional solvers in speed across convex/non-convex and smooth/non-smooth problems while maintaining feasibility.
Gaussian Process Upper Confidence Bound Achieves Nearly-Optimal Regret in Noise-Free Gaussian Process Bandits: This paper proves that the GP-UCB algorithm achieves nearly-optimal regret bounds in noise-free GP bandit problems, including the first \(O(1)\) constant cumulative regret under the SE kernel and the Matérn kernel (with \(d > \nu\)), thereby closing the long-standing gap between the theory of GP-UCB and its empirical performance.
Generalized Linear Mode Connectivity for Transformers: This paper proposes a unified symmetry framework (a four-level hierarchy of permutation, semi-permutation, orthogonal, and invertible transformations) to achieve zero or near-zero barrier linear mode connectivity (LMC) on Vision Transformers and GPT-2 for the first time, and further extends the framework to multi-model merging and heterogeneous-width alignment.
Graph Alignment via Birkhoff Relaxation: This paper provides the first theoretical guarantees for the Birkhoff relaxation of the graph alignment problem (relaxing the permutation matrix constraint to doubly stochastic matrices), proving a phase transition in the Gaussian Wigner model: when \(\sigma = o(n^{-1})\), the relaxed solution approximates the true permutation; when \(\sigma = \Omega(n^{-0.5})\), the relaxed solution is far from the true permutation.
Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection: This paper discovers the Feature Resonance phenomenon—when optimizing the representations of known in-distribution (ID) nodes, unknown ID nodes undergo significantly larger representational changes than OOD nodes, and this phenomenon is label-agnostic. Based on this observation, the authors propose RSL, a graph OOD node detection framework that requires no multi-class labels, achieving state-of-the-art performance across 13 datasets.
Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points: This paper proposes the Perturbed Wasserstein Gradient Flow (PWGF) algorithm, which injects noise perturbations via Hessian-guided Gaussian processes to enable efficient saddle point escape and second-order optimality in probability measure optimization.
How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension: This paper introduces the Domain Shattering Dimension (Gdim), a novel combinatorial measure that tightly characterizes the number of domains required for domain generalization (i.e., the domain sample complexity), and establishes its relationship to the classical VC dimension as \(\Theta(d \log(1/\alpha))\).
Hybrid-Balance GFlowNet for Solving Vehicle Routing Problems: This paper proposes the Hybrid-Balance GFlowNet (HBG) framework, which for the first time introduces Detailed Balance (DB) in the VRP setting and unifies it with Trajectory Balance (TB), along with a depot-guided inference strategy. HBG consistently and significantly improves two existing GFlowNet-based solvers (AGFN and GFACS) on CVRP and TSP benchmarks.
Impact of Layer Norm on Memorization and Generalization in Transformers: This work systematically reveals the fundamentally distinct roles of LayerNorm in Pre-LN and Post-LN Transformers: in Pre-LN, LN is essential for learning and its removal disrupts generalization; in Post-LN, LN drives memorization and its removal suppresses memorization while recovering true labels.
Improved Approximation Algorithms for Chromatic and Pseudometric-Weighted Correlation Clustering: For two important generalizations of Correlation Clustering—Chromatic CC and pseudometric-weighted CC—this paper achieves a 2.15-approximation and a tight 10/3-approximation, respectively, via LP relaxation and carefully designed rounding functions, significantly improving upon the previous best results of 2.5 and 6.
Improving Decision Trees through the Lens of Parameterized Local Search: This paper analyzes local search operations for decision tree optimization through the lens of parameterized complexity, identifies the sources of computational hardness, and proves that the combination of the number of features and domain size yields fixed-parameter tractability (FPT), accompanied by a proof-of-concept implementation.
Improving Forecasts of Suicide Attempts for Patients with Little Data: This paper proposes the Latent Similarity Gaussian Process (LSGP), which embeds patients into a continuous latent space to capture heterogeneity, enabling data-scarce patients to "borrow" predictive trends from similar patients, thereby improving suicide attempt prediction based on EMA data.
Inferring Stochastic Dynamics with Growth from Cross-Sectional Data: This paper proposes Unbalanced Probabilistic Flow Inference (UPFI), which jointly infers the drift, diffusion, and growth rate of stochastic dynamical systems from cross-sectional data via a Lagrangian formulation of the Fokker-Planck equation, constituting the first method to accurately handle scenarios involving cell proliferation and death.
Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination: For noiseless linear regression under the oblivious contamination model, this paper formally proves that any efficient Statistical Query algorithm requires VSTAT complexity at least \(\tilde{\Omega}(d^{1/2}/\alpha^2)\), providing evidence that the quadratic dependence on \(1/\alpha\) constitutes an essential computational lower bound for efficient algorithms.
Infrequent Exploration in Linear Bandits: This paper proposes the INFEX framework, which executes a baseline algorithm (e.g., LinUCB/LinTS) at designated exploration steps according to a given schedule and selects arms greedily at all other time steps. It is proven that as long as the number of exploration steps exceeds \(\omega(\log T)\), INFEX achieves the same poly-logarithmic regret as full-time exploration while substantially reducing computational overhead (80%–99% of time steps are greedy).
Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry: This paper extends the Johnson-Lindenstrauss (JL) lemma from Euclidean space to general symmetric hollow dissimilarity matrices, proposing two complementary approaches — pseudo-Euclidean JL and generalized power distance JL — where the approximation error scales proportionally with the degree of deviation from Euclidean geometry.
Kernel Conditional Tests from Learning-Theoretic Bounds: A unified framework is proposed for converting confidence bounds of learning algorithms into conditional hypothesis tests. Built upon kernel ridge regression, the framework yields conditional two-sample tests with finite-sample guarantees and, for the first time, supports non-i.i.d. data and online sampling scenarios.
Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics: This paper proposes Helmholtz metrics — differentiable metrics derived from the Helmholtz conditions — to quantify how closely a given ODE approximates the Euler-Lagrange equations. These metrics are incorporated as regularization terms into second-order Neural ODE training, forming Lagrangian Neural ODEs that guide the model toward true physical laws with zero additional inference overhead.
Learning-Augmented Online Bipartite Fractional Matching: This paper proposes two learning-augmented algorithms (LAB and PAW) for online bipartite fractional matching. Given a potentially inaccurate advice matching, both algorithms Pareto-dominate the naïve CoinFlip strategy across the entire robustness spectrum for the first time.
Learning-Augmented Streaming Algorithms for Correlation Clustering: This paper proposes the first learning-augmented streaming algorithms for Correlation Clustering. By leveraging pairwise distance predictions, the proposed methods achieve a better-than-3 approximation ratio on complete graphs (\(\tilde{O}(n)\) space) and an \(O(\log|E^-|)\) approximation ratio on general graphs (\(\tilde{O}(n)\) space), significantly improving the space–approximation tradeoff over existing prediction-free algorithms.
Learning (Approximately) Equivariant Networks via Constrained Optimization: This paper proposes ACE (Adaptive Constrained Equivariance), a framework that formulates equivariant neural network training as a constrained optimization problem. Via dual methods, ACE automatically and progressively transitions from a flexible non-equivariant model to an equivariant one, adapting to both fully and partially symmetric data without manual hyperparameter tuning.
Learning Dense Hand Contact Estimation from Imbalanced Data: This paper proposes the HACO framework, which addresses class imbalance via Balanced Contact Sampling (BCS) and spatial imbalance via a Vertex-level Class-Balanced Loss (VCB Loss). HACO is the first dense hand contact estimation model trained across 14 datasets (655K images) and achieves state-of-the-art performance across diverse interaction scenarios.
Learning Dynamics of RNNs in Closed-Loop Environments: This paper establishes a mathematical theory revealing that RNNs exhibit fundamentally different learning dynamics under closed-loop (agent–environment interaction) versus open-loop (supervised learning) training. Closed-loop learning follows a three-phase process driven by the competition between short-term policy improvement and long-term stability.
Learning non-equilibrium diffusions with Schrödinger bridges: from exactly solvable to simulation-free: This paper generalizes the Schrödinger bridge problem (SBP) from Brownian motion reference processes to multivariate Ornstein-Uhlenbeck (mvOU) reference processes, derives exact solutions for the Gaussian case, and proposes the simulation-free mvOU-OTFM algorithm for general distributions.
Learning to Condition: A Neural Heuristic for Scalable MPE Inference: This paper proposes Learning to Condition (L2C), which trains an attention network to learn dual scores — optimality and simplification — for variable-value pairs from solver search trajectories, guiding conditioning decisions in MPE inference over probabilistic graphical models (PGMs). L2C substantially reduces the search space on high-treewidth models while maintaining or improving solution quality.
Look-Ahead Reasoning on Learning Platforms: This paper formalizes level-\(k\) look-ahead reasoning in user–algorithm interactions on learning platforms. It proves that individually selfish higher-order reasoning only accelerates convergence without altering the equilibrium (i.e., no long-term gain), while the benefit of collective coordination is determined by the alignment between the learner's and users' utility functions. A theoretical framework is provided to characterize upper bounds on coordination gains.
MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision: MAS-ZERO is the first inference-time automatic MAS design framework. Through a meta-agent that iteratively designs, critiques, and refines MAS configurations (including task decomposition and sub-MAS assignment), it requires no validation set or training, and outperforms both manual and automatic MAS baselines on reasoning (+16.69%), programming (+16.66%), and search agent (+5.45%) tasks while maintaining a Pareto-optimal accuracy–cost trade-off.
MaxSup: Overcoming Representation Collapse in Label Smoothing: By decomposing the loss function of Label Smoothing (LS), this paper identifies an "error amplification term" that exacerbates misclassification, leading to intra-class feature collapse. The proposed Max Suppression (MaxSup) method redirects the penalty target from the ground-truth logit to the top-1 logit, eliminating the error amplification effect while preserving beneficial regularization.
MEGState: Phoneme Decoding from Magnetoencephalography Signals: This paper proposes MEGState, an architecture combining multi-resolution convolution and sensor-wise state space models (SSMs) for decoding phonemes from magnetoencephalography (MEG) signals, achieving substantial improvements over baseline methods on the LibriBrain dataset.
Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback: This paper proposes a meta-learning framework that automatically discovers local neo-Hebbian synaptic plasticity rules via outer-loop gradient optimization, enabling recurrent neural networks to perform structured credit assignment using only sparse, delayed reward signals, thereby providing new insights into the learning mechanisms of biological neural networks.
MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation: MetaFind is a scene-aware tri-modal (text + image + point cloud) 3D asset retrieval framework that encodes scene layout information via an SE(3)-equivariant spatial-semantic graph neural network (ESSGNN), enabling iterative asset retrieval with style consistency and spatial coherence for metaverse scene generation.
MiCADangelo: Fine-Grained Reconstruction of Constrained CAD Models from 3D Scans: MiCADangelo emulates the reverse engineering workflow of human CAD designers: it extracts 2D patterns via multi-plane cross-section analysis, predicts constrained parametric sketches, and optimizes extrusion parameters, achieving for the first time complete parametric model reconstruction with sketch constraints in 3D CAD reverse engineering.
Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications: This paper proposes a behavior-oriented definition and regulatory framework for AI-powered Lethal Autonomous Weapons Systems (AI-LAWS). It identifies systems requiring enhanced regulation through two technical criteria, puts forward five concrete policy recommendations, and calls on AI researchers to participate actively throughout the full lifecycle of military AI governance.
Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge: This paper proposes the Unbalanced Mean Field Schrödinger Bridge (UMFSB) framework and the CytoBridge deep learning algorithm, which simultaneously model unbalanced stochastic cell dynamics and cell-cell interactions from sparse temporal snapshot data.
Modeling Neural Activity with Conditionally Linear Dynamical Systems: This paper proposes Conditionally Linear Dynamical Systems (CLDS), where Gaussian process priors allow the parameters of a linear dynamical system to vary nonlinearly as a function of observed experimental covariates, preserving the interpretability and efficient inference of linear models while capturing the nonlinear dynamics prevalent in neural circuits.
MutualVPR: A Mutual Learning Framework for Resolving Supervision Inconsistencies via Adaptive Clustering: This paper proposes MutualVPR, a mutual learning framework that dynamically assigns scene category labels through feature-driven adaptive K-means clustering, addressing the supervision inconsistency problem in classification-based VPR methods caused by viewpoint variation and occlusion.
Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model: This paper extends Neural Collapse (NC) theory to ordinal regression (OR) tasks based on cumulative link models (CLM). Under the unconstrained feature model (UFM) framework, three hallmark properties of Ordinal Neural Collapse (ONC) are formally proven: within-class mean collapse (ONC1), feature collapse onto a one-dimensional subspace (ONC2), and ordered arrangement of latent variables by class (ONC3). In the zero-regularization limit, a concise geometric relationship between latent variables and thresholds is revealed.
Neural Network for Simulating Radio Emission from Extensive Air Showers: A simple fully connected neural network is employed to replace computationally expensive CoREAS Monte Carlo simulations, enabling fast prediction of radio pulses from extensive air showers (EAS) while achieving \(X_{\text{max}}\) reconstruction resolution comparable to conventional simulations.
Non-Clairvoyant Scheduling with Progress Bars: This paper introduces a "progress bar" information model as an interpolation framework between clairvoyant and non-clairvoyant scheduling. It designs scheduling algorithms with optimal consistency–robustness tradeoffs for both adversarial and stochastic progress bars, while advancing the theoretical frontier of learning-augmented scheduling.
Nonlinearly Preconditioned Gradient Methods: Momentum and Stochastic Analysis: Under the anisotropic descent inequality framework, this paper introduces heavy ball momentum into nonlinearly preconditioned gradient methods and analyzes the convergence properties of their stochastic variants under multiple noise assumptions, thereby unifying the theoretical analysis of gradient clipping and normalized gradient methods.
Normalization in Attention Dynamics: This paper unifies various normalization schemes (Post-LN, Pre-LN, Mix-LN, Peri-LN, nGPT, sqrt-scaling) under a single framework of velocity modulation in an interacting particle system on the sphere. It theoretically characterizes how each scheme affects token clustering dynamics and representation collapse, identifying Peri-LN as the theoretically optimal choice.
Obliviator Reveals the Cost of Nonlinear Guardedness in Concept Erasure: This paper proposes Obliviator — a post-processing concept erasure method based on HSIC minimization in RKHS — that iteratively deforms the feature space through a two-step optimization procedure. It is the first method to achieve complete guardedness against nonlinear adversaries, while quantifying the utility-erasure trade-off of nonlinear guardedness. Obliviator substantially outperforms existing methods across multiple PLMs and datasets.
On a Geometry of Interbrain Networks: This opinion piece proposes introducing discrete graph curvature (Forman-Ricci and Ollivier-Ricci curvature) into interbrain network analysis within hyperscanning research. It leverages the entropy of curvature distributions to detect network phase transitions and uses curvature values to infer interbrain information routing strategies, moving beyond the descriptive limitations of conventional correlation-based metrics.
On Agnostic PAC Learning in the Small Error Regime: In the small error regime of agnostic PAC learning (\(\tau \approx d/m\)), this paper constructs a computationally efficient learner based on ERM aggregation that achieves an error upper bound of \(c \cdot \tau + O(\sqrt{\tau d/m} + d/m)\) with \(c \leq 2.1\), matching known lower bounds and advancing the precise complexity characterization of agnostic learning.
On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling: This paper reveals that under standard parameterization (SP), the cross-entropy loss causes the previously monolithic "unstable" regime to split into two distinct sub-regimes: catastrophic instability and controlled divergence. In the controlled divergence regime (\(\eta_n = \Theta(n^{-1/2})\)), logits diverge while gradients and activations remain stable, thereby establishing the first practically useful infinite-width limit for SP that admits feature learning.
On Topological Descriptors for Graph Products: This paper systematically investigates the expressive power of topological descriptors — Euler Characteristic (EC) and Persistent Homology (PH) — computed on (box) products of graphs under various filtrations. It proves that PH descriptors on graph products are strictly more expressive than those computed on individual graphs, whereas EC does not enjoy this property, and proposes an efficient algorithm for computing PH on product graphs.
On Universality Classes of Equivariant Networks: This paper proves that the separation power of equivariant neural networks (i.e., their ability to distinguish symmetry-inequivalent inputs) is insufficient to fully characterize their approximation capacity—models with identical separation power may possess strictly different approximation abilities. The paper provides a complete characterization of universality classes for shallow invariant networks and establishes sufficient conditions for universality failure.
One Sample is Enough to Make Conformal Prediction Robust: This paper proposes RCP1 (Robust Conformal Prediction with One sample), which certifies the conformal procedure itself rather than individual conformity scores. Requiring only a single randomly perturbed forward pass at inference, RCP1 yields smaller robust prediction sets than state-of-the-art methods that require 100 forward passes.
Optimism Without Regularization: Constant Regret in Zero-Sum Games: This paper provides the first proof that Optimistic Fictitious Play without regularization achieves \(O(1)\) constant regret in \(2\times2\) zero-sum games, matching the optimal rate of regularized Optimistic FTRL. It further establishes an \(\Omega(\sqrt{T})\) regret lower bound for Alternating Fictitious Play, separating the capabilities of optimism and alternation in the unregularized setting.
Optimized Learned Count-Min Sketch: This paper proposes OptLCMS, which partitions the score space and analytically solves CMS parameters via KKT conditions while optimizing thresholds through dynamic programming, substantially accelerating the construction process and providing theoretical guarantees on the probability of intolerable error.
OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning: This paper presents OrbitZoo, a multi-agent RL environment built on the industrial-grade astrodynamics library Orekit. It integrates high-fidelity orbital dynamics (including atmospheric drag, solar radiation pressure, and third-body effects), a PettingZoo multi-agent interface, and real-time 3D visualization. Validation against real Starlink ephemerides yields a mean MAPE of only 0.16%.
OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata: OrthoLoC establishes the first large-scale UAV 6-DoF localization benchmark dataset based on orthographic geodata (DOP+DSM), comprising 16,425 real UAV images across 47 regions in Germany and the United States. It further introduces AdHoP (Adaptive Homography Preprocessing), a matching enhancement technique that improves matching performance by 95% and reduces translation error by 63% without modifying the underlying feature matcher.
Out-of-distribution Generalisation is Hard: Evidence from ARC-like Tasks: By constructing ARC-like tasks with well-defined OOD metrics, this paper demonstrates that standard neural networks (MLP/CNN/Transformer) fail to achieve compositional OOD generalisation. Moreover, even architectures designed with correct inductive biases that attain near-perfect OOD performance may still learn incorrect compositional features.
Overfitting in Adaptive Robust Optimization: This paper establishes an analogy between policy fragility in Adaptive Robust Optimization (ARO) and overfitting in machine learning: adaptive policies perform well within the uncertainty set but may fail outside it. The paper proposes constraint-specific uncertainty set sizing as a "regularization" mechanism to balance robustness and adaptability.
Plasticity as the Mirror of Empowerment: This paper proposes Generalized Directed Information (GDI) as an information-theoretic tool for measuring agent plasticity, revealing that plasticity is the "mirror" of empowerment — both use the same measure but in opposite directions — and proves a strict tension bound between the two.
Note 7: Value-Guided Search - Efficient Chain-of-Thought Reasoning: This paper proposes Value-Guided Search (VGS), which employs a token-level value model to guide block-level beam search without requiring predefined "steps." VGS achieves a +14.5% relative accuracy improvement over majority voting on competition mathematics while reducing inference computation by 30%, outperforming existing PRM-based approaches.
Position: There Is No Free Bayesian Uncertainty Quantification: This paper challenges the validity of Bayesian uncertainty quantification (UQ) from a frequentist perspective, reinterprets Bayesian updating as an optimization problem over model ensembles, and proposes a PAC-framework-based calibration algorithm for constructing prediction intervals with frequentist guarantees.
Prediction-Powered Semi-Supervised Learning with Online Power Tuning: This paper extends the Prediction-Powered Inference (PPI) framework to the training phase of semi-supervised learning. It proposes an unbiased gradient estimator and designs an online AdaGrad algorithm to dynamically tune the interpolation parameter \(\lambda\) between pseudo-labels and true labels, achieving convergence rates matching the optimal fixed \(\lambda\) while maintaining unbiasedness.
Private Evolution Converges: This paper provides the first convergence guarantee for the Private Evolution (PE) synthetic data generation algorithm that does not rely on unrealistic assumptions, proving that under appropriate hyperparameter settings, the \((ε,δ)\)-DP synthetic dataset output by PE achieves a 1-Wasserstein distance of \(\tilde{O}(d(nε)^{-1/d})\).
Product Distribution Learning with Imperfect Advice: This paper studies the problem of learning product distributions over the Boolean hypercube given an imperfect advice distribution, and proposes an efficient algorithm that achieves sub-linear dependence on dimension \(d\) in sample complexity when the advice is of sufficient quality.
Radar: Benchmarking Language Models on Imperfect Tabular Data: This paper introduces the Radar benchmark, which systematically evaluates language models' data-aware reasoning on imperfect tabular data by injecting five categories of data artifacts (missing values, bad values, outliers, formatting inconsistencies, and logical inconsistencies) into real-world tables. The benchmark reveals that even frontier models suffer substantial performance degradation upon the introduction of data artifacts.
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians: This paper adopts a dynamical systems perspective grounded in Jacobian analysis to move beyond the symmetry constraints imposed by traditional energy-function frameworks. It reveals the critical role of normalization layers in suppressing the spectral norm and oscillatory components of self-attention, identifies that high-performing recurrent self-attention models exhibit Lyapunov exponents approaching zero (a criticality regime), and proposes a spectral regularization method that substantially improves inference performance.
Redundancy-Aware Test-Time Graph Out-of-Distribution Detection: This paper proposes RedOUT, a framework that constructs coding trees via structural entropy minimization to eliminate redundant information in graph structures. Combined with the Redundancy-aware Graph Information Bottleneck (ReGIB) principle, RedOUT effectively distinguishes in-distribution (ID) from out-of-distribution (OOD) graph samples at test time without modifying pretrained model parameters, achieving an average AUC of 87.46% across 10 dataset pairs.
Regression Trees Know Calculus: This paper reveals gradient information latent in piecewise-constant regression trees — by treating the difference in child-node means as a finite-difference analogue, it efficiently extracts gradient estimates, thereby importing differential tools such as Active Subspaces (AS) and Integrated Gradients (IG) into tree models, broadening both their interpretability and predictive improvement capabilities.
Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry: This paper proposes NCAL-R, which leverages the Neural Collapse (NC) geometry emerging in the terminal training phase of deep networks. Two scoring metrics—Class Mean Alignment Perturbation (CMAP) and Feature Fluctuation (FF)—are designed for sample selection, making active learning more reliable under label noise and distribution shift. The method consistently outperforms conventional AL baselines on ImageNet-100 and CIFAR-100.
Note 5: ReSearch — Learning to Reason with Search: ReSearch embeds search operations as first-class primitives within reasoning chains and leverages GRPO reinforcement learning to automatically learn when and how to search—without any supervision on intermediate reasoning steps—achieving an average relative improvement of 15.81% over baselines on multi-hop QA benchmarks.
ResNets Are Deeper Than You Think: This paper proves that residual networks and feedforward networks occupy distinct function spaces (i.e., ResNets are not a simple reparameterization of feedforward networks), and demonstrates through post-training partial linearization experiments that variable-depth architectures (ResNet-like) consistently outperform fixed-depth architectures even after controlling for trainability differences, suggesting that residual connections provide inductive biases beyond optimization.
Rethinking PCA Through Duality: This paper revisits PCA through the Difference-of-Convex (DC) framework, establishing kernelization and out-of-sample extension capabilities, revealing that simultaneous iteration is a special case of DCA, and proposing a kernelizable dual formulation for robust \(\ell_1\)-PCA.
Revisiting Agnostic Boosting: This paper proposes a new agnostic boosting algorithm that substantially improves the sample complexity of prior work under very general assumptions, and establishes nearly matching lower bounds, thereby resolving the sample complexity of agnostic boosting up to logarithmic factors.
RNNs Perform Task Computations by Dynamically Warping Neural Representations: This paper proposes a Riemannian geometric framework that pulls back the metric from the RNN state space onto the input manifold, demonstrating that RNNs perform computation by dynamically warping their representations of task variables—compressing task-irrelevant inputs and stretching space near decision boundaries. Crucially, this warping is not a byproduct of computation but constitutes computation itself.
Robust Sampling for Active Statistical Inference: This paper proposes a robust sampling strategy based on budget-preserving paths that optimally interpolates between uniform sampling and active sampling, ensuring the resulting estimator's variance is never worse than either baseline. This addresses the performance degradation caused by inaccurate uncertainty estimation in active statistical inference.
SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures: Using mathematical tools from o-minimal structures, this paper establishes a dichotomy for gradient flows in fully connected networks with common smooth activation functions (sigmoid, tanh, softplus, GELU, etc.): the flow either converges to a critical point or diverges to infinity with the loss converging to an asymptotic critical value. In particular, for polynomial target functions, the paper proves that the loss cannot be exactly zero but can be made arbitrarily close to zero, which necessarily causes parameter divergence.
Sample-Adaptivity Tradeoff in On-Demand Sampling: This paper systematically studies the tradeoff between sample complexity and adaptive rounds in on-demand sampling. In the realizable setting, it proves that the optimal sample complexity of \(r\)-round algorithms is \(dk^{\Theta(1/r)}/\varepsilon\). In the agnostic setting, it proposes the LazyHedge algorithm that achieves near-optimal sample complexity in only \(\widetilde{O}(\sqrt{k})\) rounds, and introduces the OODS abstract framework to establish nearly tight round complexity lower bounds.
Scalable GPU-Accelerated Euler Characteristic Curves: Optimization and Differentiable Learning for PyTorch: This paper proposes an ECC CUDA kernel optimized for modern Ampere GPUs, achieving 16–2000× speedup over prior GPU implementations, and introduces a differentiable PyTorch layer supporting end-to-end topological feature learning on dense grid images via DECT-style sigmoid relaxation.
Scalable Inference of Functional Neural Connectivity at Submillisecond Timescales: This paper generalizes the conventional discrete-time Poisson GLM to a continuous-time Poisson point process framework. Two approaches—Monte Carlo sampling and second-order polynomial approximation—are proposed to bypass the intractable integral in the likelihood. Combined with orthogonal generalized Laguerre basis functions, the method achieves minute-scale training on recordings spanning hundreds of neurons and thousands of seconds, enabling synaptic connectivity inference at submillisecond resolution.
Semi-infinite Nonconvex Constrained Min-Max Optimization: For nonconvex min-max optimization problems with infinitely many nonconvex constraints, this paper proposes the iDB-PD (Inexact Dynamic Barrier Primal-Dual) algorithm. Under the Łojasiewicz regularity condition, it establishes the first global non-asymptotic convergence guarantees: stationarity \(\mathcal{O}(\epsilon^{-3})\), feasibility \(\mathcal{O}(\epsilon^{-6\theta})\), and complementary slackness \(\mathcal{O}(\epsilon^{-3\theta/(1-\theta)})\).
Semi-supervised Graph Anomaly Detection via Robust Homophily Learning: This paper proposes RHO (Robust Homophily Learning), which addresses the homophily diversity of normal nodes in semi-supervised graph anomaly detection via an adaptive frequency response filter (AdaFreq) and a Graph Normality Alignment (GNA) module, outperforming existing methods on 8 real-world datasets.
Sharpness-Aware Minimization with Z-Score Gradient Filtering: This paper proposes Z-Score Filtered SAM (ZSAM), which applies per-layer Z-Score statistical filtering to gradient vectors, retaining only the most statistically significant gradient components for the perturbation ascent step. This guides the optimizer toward flat minima more effectively, achieving consistent improvements in test accuracy across multiple datasets and architectures.
Sheaf Cohomology of Linear Predictive Coding Networks: This paper formalizes linear predictive coding (PC) networks as cellular sheaves, proves that PC inference is equivalent to diffusion under the sheaf Laplacian, and employs the Hodge decomposition to factorize supervisory signals into eliminable errors (removed via inference) and irreducible errors (characterized by the cohomology of cyclic topology). This framework precisely explains why certain cyclic weight initializations lead to learning stagnation.
Sign-In to the Lottery: Reparameterized Sparse Training from Scratch: This paper identifies the root cause of poor performance in pruning-at-initialization (PaI) sparse training as the inability to learn correct parameter signs as dense-to-sparse methods do. To address this, the authors propose Sign-In reparameterization (\(\theta = m \odot w\)), which introduces an internal degree of freedom to facilitate sign flipping. The approach is theoretically shown to resolve a class of sign-flipping scenarios complementary to those addressed by overparameterization, and empirically yields substantial improvements in sparse-from-scratch training.
SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks: This paper proposes SPACE, the first source-free single-sample test-time adaptation (TTA) method specifically designed for spiking neural networks (SNNs). By maximizing the consistency of spike-based feature maps across augmented views of a test sample, SPACE achieves robust adaptation across multiple datasets and architectures.
Stable Matching with Ties: Approximation Ratios and Learning: This paper studies two-sided matching markets with tied preferences, introduces the Optimal Stable Share (OSS) ratio to measure fairness, proves that the OSS-ratio under distributions over stable matchings is \(\Omega(N)\) while under general matching distributions it is \(O(\log N)\) (asymptotically tight), and extends the offline approximation results to a bandit learning setting.
Statistical Inference for Gradient Boosting Regression: This paper proposes a unified statistical inference framework for gradient boosting regression. By integrating dropout and parallel training into the Boulevard regularization scheme, the authors establish corresponding central limit theorems, enabling built-in confidence intervals, prediction intervals, and hypothesis tests for variable importance. A key finding is that increasing the dropout rate and the number of parallel trees substantially improves signal recovery—by up to \(2\times\) and \(4\times\), respectively.
Statistical Inference Under Performativity: This paper establishes the first complete end-to-end statistical inference framework for performative prediction, deriving a central limit theorem and data-driven covariance estimation for repeated risk minimization (RRM) algorithms, and extending prediction-powered inference (PPI) to the dynamic performative setting to obtain tighter confidence intervals.
Structure-Aware Spectral Sparsification via Uniform Edge Sampling: This paper proves that on graphs with sufficiently strong clustering structure (structure ratio \(\Upsilon(k)\) large enough), uniform edge sampling suffices to preserve the spectral subspace structure required for spectral clustering, without expensive effective resistance precomputation — providing the first provable guarantee that uniform sampling preserves such structure.
The Computational Complexity of Counting Linear Regions in ReLU Neural Networks: This paper systematically identifies six mutually non-equivalent definitions of "linear regions" in ReLU networks, proves that counting linear regions is #P-hard under all six definitions (even for single-hidden-layer networks), and establishes strong inapproximability results together with polynomial-space upper bounds for deeper networks.
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets: This paper studies the parameter complexity of robust memorization in ReLU networks — i.e., the number of parameters required to interpolate an arbitrary dataset while maintaining consistent predictions within a \(\mu\)-neighborhood of each training sample — and establishes tighter upper and lower bounds across the full range \((0,1)\) of the robustness ratio \(\rho = \mu/\epsilon\).
The Parameterized Complexity of Computing the VC-Dimension: This paper systematically investigates the parameterized complexity of computing the VC dimension, establishing that the naive exhaustive algorithm is asymptotically optimal under ETH, presenting an FPT 1-additive approximation algorithm parameterized by maximum degree, an exact \(2^{O(\text{tw} \cdot \log \text{tw})} \cdot |V|\) algorithm parameterized by treewidth, and a complete characterization of the tractability landscape across all standard structural parameters.
The Persistence of Neural Collapse Despite Low-Rank Bias: This paper theoretically demonstrates that Deep Neural Collapse (DNC) is globally suboptimal in deep unconstrained feature models due to the low-rank bias induced by L2 regularization, while providing the first theoretical explanation for the persistent empirical occurrence of DNC — its solution-space dimensionality grows faster with network width than that of low-rank solutions.
The Structural Complexity of Matrix-Vector Multiplication: This paper proves that for Boolean matrices \(\mathbf{M} \in \{0,1\}^{m \times n}\) with corrupted VC-dimension \(d\), matrix-vector multiplication can be performed in \(\widetilde{O}(nm^{1-1/d}+m)\) time. This is the first truly sub-quadratic upper bound for structured matrices, refuting the applicability of the OMv conjecture on structured inputs, and yields the first high-accuracy sub-quadratic algorithms for dynamic Laplacian solving, effective resistance, triangle detection, and related problems.
Tight Bounds On the Distortion of Randomized and Deterministic Distributed Voting: This paper studies metric distortion in the distributed voting model. For four cost objectives (\(\text{avg-avg}\), \(\text{avg-max}\), \(\text{max-avg}\), \(\text{max-max}\)), it establishes improved tight or near-tight bounds under both deterministic and randomized mechanisms, providing an almost complete characterization of distortion in this model.
Training the Untrainable: Introducing Inductive Bias via Representational Alignment: This paper proposes Guidance, a method that transfers the architectural inductive bias of one network (the guide) to another otherwise "untrainable" network (the target) via layer-wise representational alignment (CKA), enabling FCNs to perform image classification and RNNs to approach Transformer-level language modeling performance.
Transfer Learning for Benign Overfitting in High-Dimensional Linear Regression: This paper proposes a two-step Transfer MNI (TM) method that enhances generalization of benign overfitting in overparameterized high-dimensional linear regression via a "preserve target signal + transfer source knowledge in the null space" mechanism. Non-asymptotic excess risk bounds are derived under both model shift and covariate shift, and a "free lunch" covariate shift regime is identified.
Ultrametric Cluster Hierarchies: I Want 'em All!: This paper proves that for any reasonable cluster hierarchy tree, one can efficiently find the optimal solution to any center-based clustering objective (e.g., k-means), and that these solutions are themselves hierarchical — thereby unlocking a large family of equally meaningful hierarchical structures from a single tree.
Uncertainty Estimation by Flexible Evidential Deep Learning: This paper proposes \(\mathcal{F}\)-EDL, which generalizes the Dirichlet distribution in EDL to a Flexible Dirichlet (FD) distribution for modeling class probability distributions. This approach significantly enhances the generalization of uncertainty estimation under complex scenarios such as noise, long-tail distributions, and distribution shift, while preserving the efficiency of a single forward pass.
Uncertainty Quantification for Reduced-Order Surrogate Models Applied to Cloud Microphysics: This paper proposes the first post-hoc, model-agnostic uncertainty quantification framework for latent-space reduced-order models. By applying conformal prediction to the reconstruction, latent dynamics, and end-to-end prediction components independently, it constructs distribution-free prediction intervals and reveals component-level uncertainty propagation in cloud microphysics ROMs — showing that structural errors in the autoencoder, rather than dynamics errors, dominate end-to-end prediction uncertainty.
UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing: This paper proposes UniFormer, a unified and efficient Transformer architecture for cross-platform deployment on both GPUs and FPGAs. Through a dual-branch attention mechanism consisting of global linear attention and local block attention, UniFormer achieves high parallelism and compute-memory fusion.
Variational Regularized Unbalanced Optimal Transport: Single Network, Least Action: This paper proposes Var-RUOT, which incorporates the necessary optimality conditions of the Regularized Unbalanced Optimal Transport (RUOT) problem into the parameterization and loss design, enabling the solution of RUOT by learning a single scalar field. The approach yields solutions with lower action and improves training stability, while also analyzing the effect of growth penalty functions on biological priors.
Note 4: WebThinker — Empowering Reasoning Models with Deep Research Capabilities: WebThinker equips large reasoning models (LRMs) with autonomous web search and navigation capabilities. Through a Think-Search-Draft strategy, it seamlessly interleaves reasoning, information gathering, and report generation. After reinforcement learning optimization, it surpasses o1 and Gemini on complex reasoning and scientific report generation tasks.
Weight Weaving: Parameter Pooling for Data-Free Model Merging: This paper proposes Weight Weaving, a plug-and-play data-free model merging enhancement method that eliminates the dependency on evaluation data by pooling model parameters (e.g., via averaging or random selection) over the scaling factor search space. Across three scenarios — multi-task learning, continual learning, and domain generalization — the method achieves an average accuracy improvement of up to 15.9 percentage points.
Zebra: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding: This paper proposes Zebra, the first zero-shot brain visual decoding framework, which disentangles fMRI representations into subject-invariant and semantic-specific components via adversarial training and residual decomposition, enabling cross-subject visual reconstruction generalization without fine-tuning on new subjects.