Skip to content

📂 Others

🧠 NeurIPS2025 · 118 paper notes

📌 Same area in other venues: 📷 CVPR2026 (98) · 🔬 ICLR2026 (116) · 💬 ACL2026 (4) · 🧪 ICML2026 (70) · 🤖 AAAI2026 (117) · 📹 ICCV2025 (33)

🔥 Top topics: Adversarial Robustness ×5 · Alignment/RLHF ×3 · Diffusion Models ×2 · Reasoning ×2

A Differentiable Model of Supply-Chain Shocks

A JAX-based differentiable Agent-Based Model (ABM) of supply chains (~1,000 firms) that combines GPU parallelization and automatic differentiation to achieve Bayesian parameter calibration three orders of magnitude faster than conventional ABC, paving the way for shock-propagation modeling in global supply-chain networks.

A Sustainable AI Economy Needs Data Deals That Work for Generators

This paper introduces the concept of the "Economic Data Processing Inequality" — in the ML value chain, data progresses from raw form to model weights to synthetic outputs, with each step refining technical signals while systematically stripping economic rights from data generators. The authors empirically validate this phenomenon through analysis of 73 publicly available data transactions, diagnose three structural deficiencies (missing provenance, asymmetric bargaining power, non-dynamic pricing), and propose the EDVEX framework as a solution blueprint.

A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation

This paper rigorously proves the mechanism behind grokking from a purely optimization-theoretic perspective. Gradient flow with small weight decay exhibits two-phase dynamics in the \(\lambda\to 0\) limit: rapid convergence to the critical manifold \(\mathcal{M}\) of the training loss, followed by a Riemannian gradient flow along the manifold minimizing the \(\ell_2\) norm at timescale \(t\approx 1/\lambda\), thereby inducing delayed generalization.

A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values

This paper proposes a unified framework that subsumes KernelSHAP, LeverageSHAP, and related Shapley value estimators under a randomized sketching perspective, provides the first non-asymptotic theoretical guarantees for KernelSHAP, and extends these methods to high-dimensional datasets such as CIFAR-10 via algorithmic improvements including Poisson approximation.

Active Measurement: Efficient Estimation at Scale

This paper proposes the Active Measurement framework, which uses AI model predictions as an importance sampling proposal distribution and achieves unbiased estimation of scientific aggregate quantities through iterative human annotation and model updates, complemented by a novel combination weighting scheme and a conditional variance estimator for constructing reliable confidence intervals.

Addressing Mark Imbalance in Integration-free Neural Marked Temporal Point Processes

This paper is the first to systematically reveal the severe impact of mark distribution imbalance on prediction performance in marked temporal point processes (MTPP). It proposes a mark-first-then-time prediction strategy, designs a thresholding method to calibrate the predicted probabilities of rare marks, and develops the integration-free IFNMTPP model to efficiently support mark probability estimation and time sampling.

Adjoint Schrödinger Bridge Sampler

This paper proposes the Adjoint Schrödinger Bridge Sampler (ASBS), which reinterprets the Schrödinger Bridge problem as a stochastic optimal control (SOC) problem. This eliminates the memoryless condition required by prior diffusion samplers, supports arbitrary source distributions (e.g., Gaussian, harmonic priors), and employs a scalable matching objective without importance weight estimation. ASBS consistently outperforms prior methods on multi-particle energy functions and molecular conformation generation.

Adjusted Count Quantification Learning on Graphs

This paper extends the classical Adjusted Classify & Count (ACC) quantification method to graph-structured data, proposing two techniques — Structural Importance Sampling (SIS) and Neighborhood-aware ACC (N-ACC) — to address structural covariate shift and non-homophilous edges in graph quantification, respectively.

Aggregation Hides OOD Generalization Failures from Spurious Correlations

This paper reveals the "aggregation masking" phenomenon in OOD generalization benchmarks: while aggregate evaluation exhibits accuracy-on-the-line (AoTL)—a positive correlation between ID and OOD accuracy—the proposed OODSelect method can identify large, semantically coherent subsets (up to 75%) from the same OOD data on which higher ID accuracy corresponds to lower OOD accuracy (Pearson R as low as −0.92), demonstrating that the harm of spurious correlations is systematically concealed by aggregate evaluation.

Alias-Free ViT: Fractional Shift Invariance via Linear Attention

This paper proposes the Alias-Free Vision Transformer (AFT), which combines anti-aliasing signal processing techniques with shift-equivariant linear cross-covariance attention, achieving near-perfect consistency (~99%) under fractional (sub-pixel) shifts for the first time, with negligible degradation in ImageNet classification accuracy.

Asymmetric Duos: Sidekicks Improve Uncertainty

Asymmetric Duos (AD) pairs a large model with a small "sidekick"—combining their predictions via temperature-weighted logit averaging—achieving near-5× deep ensemble uncertainty estimation quality at only 10–20% additional FLOPs. RN50 AD (5% FLOPs overhead) approaches an \(m=5\) deep ensemble (400% FLOPs overhead) on AUROC/AURC/SAC@98.

Bispectral OT: Dataset Comparison using Symmetry-Aware Optimal Transport

This paper proposes Bispectral Optimal Transport (BOT), which replaces the cost matrix in discrete optimal transport from raw pixel distances to bispectrum (group Fourier invariant) distances, enabling transport plans to eliminate group-action-induced variation (e.g., rotation) while preserving signal structure. On rotation-augmented MNIST and related datasets, the class-preservation accuracy improves from 33% to 84%.

Brain-Like Processing Pathways Form in Models With Heterogeneous Experts

Heterogeneous experts in Mixture-of-Experts models do not spontaneously form processing pathways. This paper proposes three brain-inspired inductive biases — routing cost, task-performance scaling, and expert dropout — that enable the model to develop a Mixture-of-Pathways architecture analogous to the brain's dynamic cortical–subcortical pathways.

Contextual Dynamic Pricing with Heterogeneous Buyers

This paper presents the first systematic study of contextual dynamic pricing with heterogeneous buyers of \(K_\star\) unknown types. It proposes an Optimistic Posterior Sampling (OPS)-based algorithm achieving an \(\tilde{O}(K_\star\sqrt{dT})\) regret bound (optimal in \(d\) and \(T\)), and further introduces ZoomV—a variance-aware adaptive discretization algorithm—achieving the optimal \(\tilde{O}(\sqrt{K_\star T})\) regret in the non-contextual setting.

Continuous Thought Machines

This paper proposes the Continuous Thought Machine (CTM), which generates neuron-level temporal dynamics via privately parameterized Neuron-Level Models (NLMs) and employs a neural synchrony matrix as the core latent representation. The model demonstrates complex reasoning, adaptive computation, and interpretable attention behavior on tasks including maze solving, ImageNet classification, and parity checking.

Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers

This paper is the first to eliminate the dependency of the robust geometric median coreset size on the number of outliers \(m\): under the condition \(n \geq 4m\), it achieves an optimal coreset size of \(\tilde{\Theta}(\varepsilon^{-1/2} + \frac{m}{n}\varepsilon^{-1})\) for \(d=1\), and \(\tilde{O}(\varepsilon^{-2}\min\{\varepsilon^{-2}, d\})\) in high dimensions. The core technical contribution is a novel non-componentwise error analysis.

Coresets for Clustering Under Stochastic Noise

This paper presents the first systematic study of \((k,z)\)-clustering coreset construction under noisy data. It proposes a novel surrogate error metric \(\mathsf{Err}_\alpha\) to replace the traditional \(\mathsf{Err}\), achieving a \(\text{poly}(k)\)-fold reduction in coreset size and a \(\text{poly}(k)\)-fold tightening of quality guarantees under mild data assumptions, along with a noise-aware cluster-wise sampling algorithm.

Deep Continuous-Time State-Space Models for Marked Event Sequences

S2P2 unifies linear Hawkes processes with deep state space models by stacking multiple implicit Linear Hawkes (LLH) layers with nonlinear activations, yielding a highly expressive continuous-time MTPP model. It leverages parallel scanning to achieve linear complexity and sub-linear runtime, improving predictive likelihood by an average of 33% across 8 real-world datasets.

Deep Learning for Continuous-Time Stochastic Control with Jumps

Two model-based deep learning algorithms (GPI-PINN and GPI-CBU) are proposed to solve finite-horizon continuous-time stochastic control problems with jumps. By iteratively training a policy network and a value network, the approach avoids discretization and simulation of state dynamics, and demonstrates strong performance in high-dimensional settings.

Deep Legendre Transform

DLT exploits the implicit Fenchel representation of convex conjugates, \(f^*(\nabla f(x)) = \langle x, \nabla f(x) \rangle - f(x)\), to reformulate conjugate computation as a standard regression problem, thereby avoiding max/min-max optimization. The method also admits a posteriori error estimation, and when combined with KAN, yields exact closed-form solutions.

Dense Associative Memory with Epanechnikov Energy

This paper proposes a log-sum-ReLU (LSR) energy function based on the Epanechnikov kernel as a replacement for the conventional log-sum-exp (LSE) energy in Dense Associative Memory. For the first time, it achieves the coexistence of exact retrieval of all stored patterns and the emergence of novel creative local minima, while preserving exponential memory capacity.

Depth-Bounds for Neural Networks via the Braid Arrangement

This paper proves that, under \(\mathcal{B}_d^0\)-conforming constraints, exactly representing \(\max\{0, x_1, \ldots, x_d\}\) with a ReLU network requires \(\Omega(\log \log d)\) layers—the first non-constant depth lower bound without weight restrictions. It also shows that rank-(3,2) maxout networks can compute the maximum of 7 values, demonstrating that the standard upper bound is not tight.

Depth-Supervised Fusion Network for Seamless-Free Image Stitching

DSFN proposes a seamless image stitching method with depth consistency constraints: a depth-aware two-stage transformation estimation addresses large-parallax alignment, soft-seam region diffusion enables natural blending, and a re-parameterization strategy improves efficiency. The method comprehensively surpasses the state of the art on the UDIS-D and IVSD datasets.

Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning

This paper proposes an algebraic framework based on directional non-commutative monoid operators, providing a unified mathematical foundation for multi-dimensional compositional embeddings and unifying SSM recurrence, Transformer self-attention, and RoPE positional encoding as special cases.

Distributionally Robust Feature Selection

This paper proposes a model-agnostic distributionally robust feature selection method that achieves a continuous relaxation of discrete selection by injecting controlled Gaussian noise into covariates, and optimizes the conditional variance of the Bayes-optimal predictor, so that the selected feature subset enables high-quality downstream models to be trained simultaneously across multiple subpopulations.

DPA: A One-Stop Metric to Measure Bias Amplification in Classification Datasets

This paper proposes Directional Predictability Amplification (DPA), a predictability-based metric for measuring bias amplification. It is the only one-stop metric that simultaneously satisfies directionality, applicability to both balanced and imbalanced datasets, and correct identification of positive and negative bias amplification, by measuring the relative change between model bias and dataset bias.

egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks

This paper introduces egoEMOTION — the first dataset combining egocentric vision (Meta Project Aria glasses) with physiological signals for emotion and personality recognition. It encompasses 43 participants, 50+ hours of recordings, and 16 tasks, and demonstrates that egocentric vision signals (particularly eye-tracking features) outperform conventional physiological signals for emotion prediction in real-world scenarios.

Emergency Response Measures for Catastrophic AI Risk

This paper systematically analyzes how Frontier Safety Policies (FSPs) can be integrated into the first two stages of China's four-phase emergency response framework (prevention–early warning–response–recovery), employing dangerous capability evaluations, tiered thresholds, and pre-established safety measures to address catastrophic AI risks. The analysis is further contextualized through comparisons with international practices such as the EU AI Act and California SB 53.

Estimation of Stochastic Optimal Transport Maps

This paper introduces a transport error metric \(\mathcal{E}_p\) for stochastic OT maps, decomposed into an optimality gap and a feasibility gap. Under minimal assumptions that require neither the existence nor uniqueness of a Brenier map, a computationally efficient rounding estimator is constructed that achieves a near-optimal convergence rate of \(\tilde{O}(n^{-1/(d+2p)})\). The framework is further extended to Hölder-continuous kernels and adversarially corrupted data, establishing the first general theory for OT map estimation.

Evolutionary Learning in Spatial Agent-Based Models for Physical Climate Risk Assessment

This paper proposes an Agent-Based Model (ABM) that integrates geospatial climate hazard data with evolutionary learning mechanisms. Using a simplified economic network comprising a three-tier commodity–manufacturing–retail supply chain, the model simulates economic responses from 2025 to 2100 under RCP8.5 flood projections. Results demonstrate that evolutionary adaptation enables firms to maintain significantly higher levels of production, capital, liquidity, and employment under climate stress, while revealing supply chain systemic risks that traditional asset-level assessments fail to capture.

Evolutionary Prediction Games

This paper proposes the "Evolutionary Prediction Games" framework, applying evolutionary game theory to analyze feedback loops between prediction algorithms and user populations. It shows that ideal learners lead to competitive exclusion (survival of the fittest), whereas practical learners (with finite data, surrogate losses, or overparameterization) can instead foster stable coexistence and mutualism among groups.

Exact Learning of Arithmetic with Differentiable Agents

This paper proposes the Differentiable Finite-State Transducer (DFST), a Turing-complete and end-to-end differentiable model family that operates on a 2D symbol grid. Trained via Policy-Trajectory Observations (PTOs) derived from expert arithmetic computations, DFST achieves perfect generalization to 3,850-digit binary addition and 2,450-digit decimal addition using only 20 training samples of up to 3-digit addition, with zero observed errors.

Exploiting Task Relationships in Continual Learning via Transferability-Aware Task Embeddings

This paper proposes H-embedding, a transferability-aware task embedding based on H-score, and integrates it into a hypernetwork framework. By explicitly modeling inter-task relationships in the embedding space to guide parameter generation, the method achieves state-of-the-art final accuracy in a rehearsal-free setting.

FACE: Faithful Automatic Concept Extraction

This paper proposes FACE, a framework that incorporates a KL divergence regularization term into non-negative matrix factorization (NMF) to constrain reconstructed activations to remain consistent with the original model's predictions, thereby extracting concept explanations that are truly faithful to the model's decision process. FACE comprehensively outperforms CRAFT and ICE on ImageNet, COCO, and CelebA.

Faithful Group Shapley Value

This paper proposes the Faithful Group Shapley Value (FGSV), the unique group-level data valuation method satisfying five axioms including "faithfulness," which effectively defends against the "shell company attack" (artificially inflating valuation by splitting subgroups), and introduces an efficient approximation algorithm with \(O(n \cdot \text{Poly}(\log n))\) complexity.

Fixed-Point RNNs: Interpolating from Diagonal to Dense

This paper proposes the Fixed-Point RNN framework, which parameterizes dense linear RNNs as fixed points of diagonal linear RNNs. By varying the number of iterations, the model dynamically interpolates between diagonal (efficient) and dense (expressive) regimes, achieving state-of-the-art results simultaneously on state-tracking (\(A_5\)/\(S_5\)) and copying tasks for the first time.

Fostering the Ecosystem of AI for Social Impact Requires Expanding and Strengthening Evaluation Standards

This paper argues that the academic ecosystem of AI for Social Impact (AISI) requires a dual-track reform: broadening the definition of "impact" to recognize contributions beyond deployment or methodological novelty, while simultaneously demanding causal-inference-level rigor in evaluating deployed systems.

Frequency-Aware Token Reduction for Efficient Vision Transformer

This paper proposes frequency-aware token reduction from a frequency-domain perspective, partitioning tokens into high-frequency (HF) and low-frequency (LF) groups. HF tokens are selectively retained while LF tokens are aggregated into DC tokens, simultaneously alleviating rank collapse and reducing computational cost in ViTs. The method outperforms existing SOTA across multiple models at a 30% token reduction ratio.

FSNet: Feasibility-Seeking Neural Network for Constrained Optimization with Guarantees

This paper proposes FSNet, a framework that integrates differentiable feasibility-seeking steps into neural networks. By minimizing constraint violations via unconstrained optimization, FSNet guarantees constraint satisfaction while supporting end-to-end training. It significantly outperforms traditional solvers in speed across convex/non-convex and smooth/non-smooth problems while maintaining feasibility.

Gaussian Process Upper Confidence Bound Achieves Nearly-Optimal Regret in Noise-Free Gaussian Process Bandits

This paper proves that the GP-UCB algorithm achieves nearly-optimal regret bounds in noise-free GP bandit problems, including the first \(O(1)\) constant cumulative regret under the SE kernel and the Matérn kernel (with \(d > \nu\)), thereby closing the long-standing gap between the theory of GP-UCB and its empirical performance.

Generalized Linear Mode Connectivity for Transformers

This paper proposes a unified symmetry framework (a four-level hierarchy of permutation, semi-permutation, orthogonal, and invertible transformations) to achieve zero or near-zero barrier linear mode connectivity (LMC) on Vision Transformers and GPT-2 for the first time, and further extends the framework to multi-model merging and heterogeneous-width alignment.

Graph Alignment via Birkhoff Relaxation

This paper provides the first theoretical guarantees for the Birkhoff relaxation (the tightest convex relaxation of QAP) under the Gaussian Wigner model: when \(\sigma = o(n^{-1})\), the relaxed solution approximates the true permutation; when \(\sigma = \Omega(n^{-0.5})\), the relaxed solution is far from the true permutation, revealing a phase transition phenomenon.

Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points

This paper proposes the Perturbed Wasserstein Gradient Flow (PWGF) algorithm, which injects noise perturbations via Hessian-guided Gaussian processes to enable efficient saddle point escape and second-order optimality in probability measure optimization.

Hybrid-Balance GFlowNet for Solving Vehicle Routing Problems

This paper proposes the Hybrid-Balance GFlowNet (HBG) framework, which for the first time introduces Detailed Balance (DB) in the VRP setting and unifies it with Trajectory Balance (TB), along with a depot-guided inference strategy. HBG consistently and significantly improves two existing GFlowNet-based solvers (AGFN and GFACS) on CVRP and TSP benchmarks.

Impact of Layer Norm on Memorization and Generalization in Transformers

This work systematically reveals the fundamentally distinct roles of LayerNorm in Pre-LN and Post-LN Transformers: in Pre-LN, LN is essential for learning and its removal disrupts generalization; in Post-LN, LN drives memorization and its removal suppresses memorization while recovering true labels.

Improving Decision Trees through the Lens of Parameterized Local Search

This paper analyzes local search operations for decision tree optimization through the lens of parameterized complexity, identifies the sources of computational hardness, and proves that the combination of the number of features and domain size yields fixed-parameter tractability (FPT), accompanied by a proof-of-concept implementation.

Improving Forecasts of Suicide Attempts for Patients with Little Data

This paper proposes the Latent Similarity Gaussian Process (LSGP), which embeds patients into a continuous latent space to capture heterogeneity, enabling data-scarce patients to "borrow" predictive trends from similar patients, thereby improving suicide attempt prediction based on EMA data.

Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

This paper proposes the HSACC framework, which employs a two-level semantic space design (low-level mutual information consistency + high-level adaptive weighted fusion) combined with cooperatively optimized implicit missing-view recovery, achieving significant improvements over existing incomplete multi-view clustering methods on five benchmark datasets.

InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras

This paper introduces InFlux, the first real-world video benchmark with per-frame ground-truth dynamic camera intrinsics (386 videos, 143K+ annotated frames). Accurate annotations are achieved via a lookup table (LUT) mapping lens metadata to intrinsic parameters. The benchmark reveals that existing intrinsic prediction methods perform poorly under dynamic intrinsic settings.

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

For noiseless linear regression under the oblivious contamination model, this paper formally proves that any efficient Statistical Query algorithm requires VSTAT complexity at least \(\tilde{\Omega}(d^{1/2}/\alpha^2)\), providing evidence that the quadratic dependence on \(1/\alpha\) constitutes an essential computational lower bound for efficient algorithms.

Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry

This paper extends the Johnson-Lindenstrauss (JL) lemma from Euclidean space to general symmetric hollow dissimilarity matrices, proposing two complementary approaches — pseudo-Euclidean JL and generalized power distance JL — where the approximation error scales proportionally with the degree of deviation from Euclidean geometry.

Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics

This paper proposes Helmholtz metrics — differentiable metrics derived from the Helmholtz conditions — to quantify how closely a given ODE approximates the Euler-Lagrange equations. These metrics are incorporated as regularization terms into second-order Neural ODE training, forming Lagrangian Neural ODEs that guide the model toward true physical laws with zero additional inference overhead.

Learning (Approximately) Equivariant Networks via Constrained Optimization

This paper proposes ACE (Adaptive Constrained Equivariance), a framework that formulates equivariant neural network training as a constrained optimization problem. Via dual methods, ACE automatically and progressively transitions from a flexible non-equivariant model to an equivariant one, adapting to both fully and partially symmetric data without manual hyperparameter tuning.

Learning Dynamics of RNNs in Closed-Loop Environments

This paper establishes a mathematical theory revealing that RNNs exhibit fundamentally different learning dynamics under closed-loop (agent–environment interaction) versus open-loop (supervised learning) training. Closed-loop learning follows a three-phase process driven by the competition between short-term policy improvement and long-term stability.

Learning non-equilibrium diffusions with Schrödinger bridges: from exactly solvable to simulation-free

This paper generalizes the Schrödinger bridge problem (SBP) from Brownian motion reference processes to multivariate Ornstein-Uhlenbeck (mvOU) reference processes, derives exact solutions for the Gaussian case, and proposes the simulation-free mvOU-OTFM algorithm for general distributions.

Learning to Condition: A Neural Heuristic for Scalable MPE Inference

This paper proposes Learning to Condition (L2C), which trains an attention network to learn dual scores — optimality and simplification — for variable-value pairs from solver search trajectories, guiding conditioning decisions in MPE inference over probabilistic graphical models (PGMs). L2C substantially reduces the search space on high-treewidth models while maintaining or improving solution quality.

Look-Ahead Reasoning on Learning Platforms

This paper formalizes level-\(k\) look-ahead reasoning in user–algorithm interactions on learning platforms. It proves that individually selfish higher-order reasoning only accelerates convergence without altering the equilibrium (i.e., no long-term gain), while the benefit of collective coordination is determined by the alignment between the learner's and users' utility functions. A theoretical framework is provided to characterize upper bounds on coordination gains.

Manipulating Feature Visualizations with Gradient Slingshots

This paper proposes Gradient Slingshots (GS), a method that "carves" a quadratic activation landscape in the out-of-distribution (OOD) input region of a model, directing the gradient-based optimization of Feature Visualization (FV) toward an arbitrary target image. The approach causes FV to converge to a predefined spurious image while leaving the model's architecture, classification accuracy, and internal feature representations largely intact, thereby exposing a serious vulnerability of FV as a model auditing tool.

MaxSup: Overcoming Representation Collapse in Label Smoothing

By decomposing the loss function of Label Smoothing (LS), this paper identifies an "error amplification term" that exacerbates misclassification, leading to intra-class feature collapse. The proposed Max Suppression (MaxSup) method redirects the penalty target from the ground-truth logit to the top-1 logit, eliminating the error amplification effect while preserving beneficial regularization.

Merlin L48 Spectrogram Dataset

This paper introduces the L48 dataset — a fine-grained spectrogram multi-label classification benchmark derived from real-world bird recordings that naturally exhibits the Single Positive Multi-Label (SPML) setting. The dataset exposes critical shortcomings of existing SPML methods under realistic conditions, and proposes an intra-recording consistency regularization scheme to improve performance.

Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback

This paper proposes a meta-learning framework that automatically discovers local neo-Hebbian synaptic plasticity rules via outer-loop gradient optimization, enabling recurrent neural networks to perform structured credit assignment using only sparse, delayed reward signals, thereby providing new insights into the learning mechanisms of biological neural networks.

MetaFind: Scene-Aware 3D Asset Retrieval for Coherent Metaverse Scene Generation

MetaFind is a scene-aware tri-modal (text + image + point cloud) 3D asset retrieval framework that encodes scene layout information via an SE(3)-equivariant spatial-semantic graph neural network (ESSGNN), enabling iterative asset retrieval with style consistency and spatial coherence for metaverse scene generation.

Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications

This paper proposes a behavior-oriented definition and regulatory framework for AI-powered Lethal Autonomous Weapons Systems (AI-LAWS). It identifies systems requiring enhanced regulation through two technical criteria, puts forward five concrete policy recommendations, and calls on AI researchers to participate actively throughout the full lifecycle of military AI governance.

Model Context Protocol for Vision Systems: Audit, Security, and Protocol Extensions

The first protocol-level audit of MCP deployment in vision systems, analyzing 91 public MCP servers and finding that 78% exhibit schema inconsistencies and 89% lack runtime validation; the paper further proposes protocol extensions including semantic schemas, visual memory, and runtime validators.

Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge

This paper proposes the Unbalanced Mean Field Schrödinger Bridge (UMFSB) framework and the CytoBridge deep learning algorithm, which simultaneously model unbalanced stochastic cell dynamics and cell-cell interactions from sparse temporal snapshot data.

Modeling Neural Activity with Conditionally Linear Dynamical Systems

This paper proposes Conditionally Linear Dynamical Systems (CLDS), where Gaussian process priors allow the parameters of a linear dynamical system to vary nonlinearly as a function of observed experimental covariates, preserving the interpretability and efficient inference of linear models while capturing the nonlinear dynamics prevalent in neural circuits.

MutualVPR: A Mutual Learning Framework for Resolving Supervision Inconsistencies via Adaptive Clustering

This paper proposes MutualVPR, a mutual learning framework that dynamically assigns scene category labels through feature-driven adaptive K-means clustering, addressing the supervision inconsistency problem in classification-based VPR methods caused by viewpoint variation and occlusion.

Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model

This paper extends Neural Collapse (NC) theory to ordinal regression (OR) tasks based on cumulative link models (CLM). Under the unconstrained feature model (UFM) framework, three hallmark properties of Ordinal Neural Collapse (ONC) are formally proven: within-class mean collapse (ONC1), feature collapse onto a one-dimensional subspace (ONC2), and ordered arrangement of latent variables by class (ONC3). In the zero-regularization limit, a concise geometric relationship between latent variables and thresholds is revealed.

Normalization in Attention Dynamics

This paper unifies various normalization schemes (Post-LN, Pre-LN, Mix-LN, Peri-LN, nGPT, sqrt-scaling) under a single framework of velocity modulation in an interacting particle system on the sphere. It theoretically characterizes how each scheme affects token clustering dynamics and representation collapse, identifying Peri-LN as the theoretically optimal choice.

Obliviator Reveals the Cost of Nonlinear Guardedness in Concept Erasure

This paper proposes Obliviator — a post-processing concept erasure method based on HSIC minimization in RKHS — that iteratively deforms the feature space through a two-step optimization procedure. It is the first method to achieve complete guardedness against nonlinear adversaries, while quantifying the utility-erasure trade-off of nonlinear guardedness. Obliviator substantially outperforms existing methods across multiple PLMs and datasets.

On a Geometry of Interbrain Networks

This opinion piece proposes introducing discrete graph curvature (Forman-Ricci and Ollivier-Ricci curvature) into interbrain network analysis within hyperscanning research. It leverages the entropy of curvature distributions to detect network phase transitions and uses curvature values to infer interbrain information routing strategies, moving beyond the descriptive limitations of conventional correlation-based metrics.

On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling

This paper reveals that under standard parameterization (SP), the cross-entropy loss causes the previously monolithic "unstable" regime to split into two distinct sub-regimes: catastrophic instability and controlled divergence. In the controlled divergence regime (\(\eta_n = \Theta(n^{-1/2})\)), logits diverge while gradients and activations remain stable, thereby establishing the first practically useful infinite-width limit for SP that admits feature learning.

On Topological Descriptors for Graph Products

This paper systematically investigates the expressive power of topological descriptors — Euler Characteristic (EC) and Persistent Homology (PH) — computed on (box) products of graphs under various filtrations. It proves that PH descriptors on graph products are strictly more expressive than those computed on individual graphs, whereas EC does not enjoy this property, and proposes an efficient algorithm for computing PH on product graphs.

On Universality Classes of Equivariant Networks

This paper proves that the separation power of equivariant neural networks (i.e., their ability to distinguish symmetry-inequivalent inputs) is insufficient to fully characterize their approximation capacity—models with identical separation power may possess strictly different approximation abilities. The paper provides a complete characterization of universality classes for shallow invariant networks and establishes sufficient conditions for universality failure.

Optimized Learned Count-Min Sketch

This paper proposes OptLCMS, which partitions the score space and analytically solves CMS parameters via KKT conditions while optimizing thresholds through dynamic programming, substantially accelerating the construction process and providing theoretical guarantees on the probability of intolerable error.

Out-of-distribution Generalisation is Hard: Evidence from ARC-like Tasks

By constructing ARC-like tasks with well-defined OOD metrics, this paper demonstrates that standard neural networks (MLP/CNN/Transformer) fail to achieve compositional OOD generalisation. Moreover, even architectures designed with correct inductive biases that attain near-perfect OOD performance may still learn incorrect compositional features.

Overfitting in Adaptive Robust Optimization

This paper establishes an analogy between policy fragility in Adaptive Robust Optimization (ARO) and overfitting in machine learning: adaptive policies perform well within the uncertainty set but may fail outside it. The paper proposes constraint-specific uncertainty set sizing as a "regularization" mechanism to balance robustness and adaptability.

Plasticity as the Mirror of Empowerment

This paper proposes Generalized Directed Information (GDI) as an information-theoretic tool for measuring agent plasticity, revealing that plasticity is the "mirror" of empowerment — both use the same measure but in opposite directions — and proves a strict tension bound between the two.

Position: There Is No Free Bayesian Uncertainty Quantification

This paper challenges the validity of Bayesian uncertainty quantification (UQ) from a frequentist perspective, reinterprets Bayesian updating as an optimization problem over model ensembles, and proposes a PAC-framework-based calibration algorithm for constructing prediction intervals with frequentist guarantees.

Private Evolution Converges

This paper provides the first convergence guarantee for the Private Evolution (PE) synthetic data generation algorithm that does not rely on unrealistic assumptions, proving that under appropriate hyperparameter settings, the \((ε,δ)\)-DP synthetic dataset output by PE achieves a 1-Wasserstein distance of \(\tilde{O}(d(nε)^{-1/d})\).

Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning

This paper addresses the Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem in AutoML. Through data-driven analysis, it reveals that HPO reward distributions are bounded and left-skewed, and proposes MaxUCB—a bandit algorithm specifically tailored to this distributional property—achieving both theoretical and empirical improvements over existing methods.

Radar: Benchmarking Language Models on Imperfect Tabular Data

This paper introduces the Radar benchmark, which systematically evaluates language models' data-aware reasoning on imperfect tabular data by injecting five categories of data artifacts (missing values, bad values, outliers, formatting inconsistencies, and logical inconsistencies) into real-world tables. The benchmark reveals that even frontier models suffer substantial performance degradation upon the introduction of data artifacts.

RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

This paper proposes RDB2G-Bench — the first benchmark framework for evaluating relational-database-to-graph modeling methods, comprising 5 real-world RDBs, 12 prediction tasks, approximately 50,000 precomputed graph model–performance pairs, and a systematic comparison of 10 automatic graph modeling approaches.

Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians

This paper adopts a dynamical systems perspective grounded in Jacobian analysis to move beyond the symmetry constraints imposed by traditional energy-function frameworks. It reveals the critical role of normalization layers in suppressing the spectral norm and oscillatory components of self-attention, identifies that high-performing recurrent self-attention models exhibit Lyapunov exponents approaching zero (a criticality regime), and proposes a spectral regularization method that substantially improves inference performance.

Regression Trees Know Calculus

This paper reveals gradient information latent in piecewise-constant regression trees — by treating the difference in child-node means as a finite-difference analogue, it efficiently extracts gradient estimates, thereby importing differential tools such as Active Subspaces (AS) and Integrated Gradients (IG) into tree models, broadening both their interpretability and predictive improvement capabilities.

Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry

This paper proposes NCAL-R, which leverages the Neural Collapse (NC) geometry emerging in the terminal training phase of deep networks. Two scoring metrics—Class Mean Alignment Perturbation (CMAP) and Feature Fluctuation (FF)—are designed for sample selection, making active learning more reliable under label noise and distribution shift. The method consistently outperforms conventional AL baselines on ImageNet-100 and CIFAR-100.

ResNets Are Deeper Than You Think

This paper proves that residual networks and feedforward networks occupy distinct function spaces (i.e., ResNets are not a simple reparameterization of feedforward networks), and demonstrates through post-training partial linearization experiments that variable-depth architectures (ResNet-like) consistently outperform fixed-depth architectures even after controlling for trainability differences, suggesting that residual connections provide inductive biases beyond optimization.

Rethinking Losses for Diffusion Bridge Samplers

This paper identifies theoretical flaws in the widely used Log Variance (LV) loss for diffusion bridge samplers—namely, that it violates the data processing inequality and its gradients are not equivalent to those of the reverse KL (rKL)—and proposes computing rKL gradients via the log-derivative trick (rKL-LD). The proposed approach consistently outperforms LV loss across multiple benchmarks while exhibiting more stable training and reduced sensitivity to hyperparameters.

Rethinking PCA Through Duality

This paper revisits PCA through the Difference-of-Convex (DC) framework, establishing kernelization and out-of-sample extension capabilities, revealing that simultaneous iteration is a special case of DCA, and proposing a kernelizable dual formulation for robust \(\ell_1\)-PCA.

Revisiting Bi-Linear State Transitions in Recurrent Neural Networks

This paper systematically revisits bilinear state transitions in RNNs—i.e., multiplicative interactions between the hidden state and the input—and theoretically proves that bilinear RNNs can simulate arbitrary finite-state machines. By removing additive terms, these models form a natural expressivity hierarchy ranging from diagonal to full-rank structures, revealing that popular linear RNNs such as Mamba occupy the lowest tier of this hierarchy.

RNNs Perform Task Computations by Dynamically Warping Neural Representations

This paper proposes a Riemannian geometric framework that pulls back the metric from the RNN state space onto the input manifold, demonstrating that RNNs perform computation by dynamically warping their representations of task variables—compressing task-irrelevant inputs and stretching space near decision boundaries. Crucially, this warping is not a byproduct of computation but constitutes computation itself.

Robust Sampling for Active Statistical Inference

This paper proposes a robust sampling strategy based on budget-preserving paths that optimally interpolates between uniform sampling and active sampling, ensuring the resulting estimator's variance is never worse than either baseline. This addresses the performance degradation caused by inaccurate uncertainty estimation in active statistical inference.

SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

Using mathematical tools from o-minimal structures, this paper establishes a dichotomy for gradient flows in fully connected networks with common smooth activation functions (sigmoid, tanh, softplus, GELU, etc.): the flow either converges to a critical point or diverges to infinity with the loss converging to an asymptotic critical value. In particular, for polynomial target functions, the paper proves that the loss cannot be exactly zero but can be made arbitrarily close to zero, which necessarily causes parameter divergence.

Scalable GPU-Accelerated Euler Characteristic Curves: Optimization and Differentiable Learning for PyTorch

This paper proposes an ECC CUDA kernel optimized for modern Ampere GPUs, achieving 16–2000× speedup over prior GPU implementations, and introduces a differentiable PyTorch layer supporting end-to-end topological feature learning on dense grid images via DECT-style sigmoid relaxation.

Scalable Inference of Functional Neural Connectivity at Submillisecond Timescales

This paper generalizes the conventional discrete-time Poisson GLM to a continuous-time Poisson point process framework. Two approaches—Monte Carlo sampling and second-order polynomial approximation—are proposed to bypass the intractable integral in the likelihood. Combined with orthogonal generalized Laguerre basis functions, the method achieves minute-scale training on recordings spanning hundreds of neurons and thousands of seconds, enabling synaptic connectivity inference at submillisecond resolution.

Semi-infinite Nonconvex Constrained Min-Max Optimization

For nonconvex min-max optimization problems with infinitely many nonconvex constraints, this paper proposes the iDB-PD (Inexact Dynamic Barrier Primal-Dual) algorithm. Under the Łojasiewicz regularity condition, it establishes the first global non-asymptotic convergence guarantees: stationarity \(\mathcal{O}(\epsilon^{-3})\), feasibility \(\mathcal{O}(\epsilon^{-6\theta})\), and complementary slackness \(\mathcal{O}(\epsilon^{-3\theta/(1-\theta)})\).

Semi-Supervised Regression with Heteroscedastic Pseudo-Labels

This paper proposes an uncertainty-aware pseudo-label framework based on heteroscedastic modeling, which dynamically calibrates per-sample pseudo-label uncertainty via bilevel optimization to mitigate the negative impact of noisy pseudo-labels on regression models, achieving state-of-the-art performance on multiple SSR benchmarks.

Sharpness-Aware Minimization with Z-Score Gradient Filtering

This paper proposes Z-Score Filtered SAM (ZSAM), which applies per-layer Z-Score statistical filtering to gradient vectors, retaining only the most statistically significant gradient components for the perturbation ascent step. This guides the optimizer toward flat minima more effectively, achieving consistent improvements in test accuracy across multiple datasets and architectures.

Sheaf Cohomology of Linear Predictive Coding Networks

This paper formalizes linear predictive coding (PC) networks as cellular sheaves, proves that PC inference is equivalent to diffusion under the sheaf Laplacian, and employs the Hodge decomposition to factorize supervisory signals into eliminable errors (removed via inference) and irreducible errors (characterized by the cohomology of cyclic topology). This framework precisely explains why certain cyclic weight initializations lead to learning stagnation.

Sign-In to the Lottery: Reparameterized Sparse Training from Scratch

This paper identifies the root cause of poor performance in pruning-at-initialization (PaI) sparse training as the inability to learn correct parameter signs as dense-to-sparse methods do. To address this, the authors propose Sign-In reparameterization (\(\theta = m \odot w\)), which introduces an internal degree of freedom to facilitate sign flipping. The approach is theoretically shown to resolve a class of sign-flipping scenarios complementary to those addressed by overparameterization, and empirically yields substantial improvements in sparse-from-scratch training.

SPACE: SPike-Aware Consistency Enhancement for Test-Time Adaptation in Spiking Neural Networks

This paper proposes SPACE, the first source-free single-sample test-time adaptation (TTA) method specifically designed for spiking neural networks (SNNs). By maximizing the consistency of spike-based feature maps across augmented views of a test sample, SPACE achieves robust adaptation across multiple datasets and architectures.

Stable Matching with Ties: Approximation Ratios and Learning

This paper studies two-sided matching markets with tied preferences, introduces the Optimal Stable Share (OSS) ratio to measure fairness, proves that the OSS-ratio under distributions over stable matchings is \(\Omega(N)\) while under general matching distributions it is \(O(\log N)\) (asymptotically tight), and extends the offline approximation results to a bandit learning setting.

Statistical Inference for Gradient Boosting Regression

This paper proposes a unified statistical inference framework for gradient boosting regression. By integrating dropout and parallel training into the Boulevard regularization scheme, the authors establish corresponding central limit theorems, enabling built-in confidence intervals, prediction intervals, and hypothesis tests for variable importance. A key finding is that increasing the dropout rate and the number of parallel trees substantially improves signal recovery—by up to \(2\times\) and \(4\times\), respectively.

Statistical Inference Under Performativity

This paper establishes the first complete end-to-end statistical inference framework for performative prediction, deriving a central limit theorem and data-driven covariance estimation for repeated risk minimization (RRM) algorithms, and extending prediction-powered inference (PPI) to the dynamic performative setting to obtain tighter confidence intervals.

Structure-Aware Spectral Sparsification via Uniform Edge Sampling

This paper proves that on graphs with sufficiently strong clustering structure (structure ratio \(\Upsilon(k)\) large enough), uniform edge sampling suffices to preserve the spectral subspace structure required for spectral clustering, without expensive effective resistance precomputation — providing the first provable guarantee that uniform sampling preserves such structure.

Test-Time Adaptation by Causal Trimming

This paper proposes TACT, a method that identifies non-causal directions in the representation space via data augmentation and PCA, then removes the projections of both test representations and class prototypes along these directions at test time. This reduces model reliance on non-causal features and significantly improves prediction performance under distribution shift.

The Computational Complexity of Counting Linear Regions in ReLU Neural Networks

This paper systematically identifies six mutually non-equivalent definitions of "linear regions" in ReLU networks, proves that counting linear regions is #P-hard under all six definitions (even for single-hidden-layer networks), and establishes strong inapproximability results together with polynomial-space upper bounds for deeper networks.

The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets

This paper studies the parameter complexity of robust memorization in ReLU networks — i.e., the number of parameters required to interpolate an arbitrary dataset while maintaining consistent predictions within a \(\mu\)-neighborhood of each training sample — and establishes tighter upper and lower bounds across the full range \((0,1)\) of the robustness ratio \(\rho = \mu/\epsilon\).

The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet

This paper proposes VCNet—a neural network architecture that simulates the macroscopic organization of the primate visual cortex—reinterpreting dual-stream separation (manifold disentanglement) and predictive coding (geodesic refinement) through the language of geometry and dynamical systems. At an extremely compact size of 0.04 MB, VCNet achieves 92.1% accuracy on Spots-10 (10% above a distilled DenseNet), and attains 74.4% on light field classification at 3.52 MB (surpassing MobileNetV2 by 2.3%).

The Persistence of Neural Collapse Despite Low-Rank Bias

This paper theoretically demonstrates that Deep Neural Collapse (DNC) is globally suboptimal in deep unconstrained feature models due to the low-rank bias induced by L2 regularization, while providing the first theoretical explanation for the persistent empirical occurrence of DNC — its solution-space dimensionality grows faster with network width than that of low-rank solutions.

Tight Bounds On the Distortion of Randomized and Deterministic Distributed Voting

This paper studies metric distortion in the distributed voting model. For four cost objectives (\(\text{avg-avg}\), \(\text{avg-max}\), \(\text{max-avg}\), \(\text{max-max}\)), it establishes improved tight or near-tight bounds under both deterministic and randomized mechanisms, providing an almost complete characterization of distortion in this model.

Tight Lower Bounds and Improved Convergence in Performative Prediction

Under the performative prediction framework, this paper provides the first tight convergence rate analysis for Repeated Risk Minimization (RRM) and proposes the Affine Risk Minimizers (ARM) algorithm class, which achieves convergence over a broader problem class by leveraging data from historical training snapshots.

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

This paper presents TrackingWorld, a pipeline for dense 3D tracking of almost all pixels from monocular video. It lifts sparse 2D trajectories to dense ones via a tracking upsampler, iteratively tracks newly appearing objects across all frames, and employs an optimization-based framework to lift 2D trajectories into world-coordinate 3D space with explicit decoupling of camera motion and object motion.

Training the Untrainable: Introducing Inductive Bias via Representational Alignment

This paper proposes Guidance, a method that transfers the architectural inductive bias of one network (the guide) to another otherwise "untrainable" network (the target) via layer-wise representational alignment (CKA), enabling FCNs to perform image classification and RNNs to approach Transformer-level language modeling performance.

Ultrametric Cluster Hierarchies: I Want 'em All!

This paper proves that for any reasonable cluster hierarchy tree, one can efficiently find the optimal solution to any center-based clustering objective (e.g., k-means), and that these solutions are themselves hierarchical — thereby unlocking a large family of equally meaningful hierarchical structures from a single tree.

Uncertainty Estimation by Flexible Evidential Deep Learning

This paper proposes \(\mathcal{F}\)-EDL, which generalizes the Dirichlet distribution in EDL to a Flexible Dirichlet (FD) distribution for modeling class probability distributions. This approach significantly enhances the generalization of uncertainty estimation under complex scenarios such as noise, long-tail distributions, and distribution shift, while preserving the efficiency of a single forward pass.

UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing

This paper proposes UniFormer, a unified and efficient Transformer architecture for cross-platform deployment on both GPUs and FPGAs. Through a dual-branch attention mechanism consisting of global linear attention and local block attention, UniFormer achieves high parallelism and compute-memory fusion.

What Does It Take to Build a Performant Selective Classifier?

This paper presents the first finite-sample decomposition of the selective classification gap, attributing it to five sources—Bayes noise, approximation error, ranking error, statistical noise, and implementation bias—and demonstrates that monotone calibration methods have limited effect on closing this gap.