Skip to content

📂 Others

🔬 ICLR2026 · 76 paper notes

A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components

This paper proposes FedGEM, an algorithm in which clients perform local EM steps and construct uncertainty sets, while the server detects cluster overlap via set intersections and infers the global number of clusters. FedGEM is the first method to achieve federated clustering with an unknown global cluster count \(K\), and comes with probabilistic convergence guarantees.

A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization

This paper establishes a novel representer theorem for estimating triggering kernels of linear multivariate Hawkes processes within the RKHS framework, proving that the optimal estimator admits a finite representation as a linear combination of equivalent kernels evaluated at data points, with all dual coefficients analytically equal to 1. This eliminates the need to solve a dual optimization problem, enabling efficient and scalable nonparametric estimation.

A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction

This paper extends CopulaGNN from the node level to the edge level for link sign prediction on signed graphs. By constructing the correlation matrix as the Gramian of edge embeddings and reformulating the conditional distribution via the Woodbury identity, the proposed method achieves scalable modeling of inter-edge statistical dependencies.

A Single Architecture for Representing Invariance Under Any Space Group

A single architecture (Crystal Fourier Transformer) is proposed that adapts to the invariance requirements of any space group. By analytically deriving the constraints imposed by group operations on Fourier coefficients, the method constructs symmetry-adapted Fourier bases and achieves parameter sharing and zero-shot generalization across all 230 space groups via a dual graph representation of these constraints.

Active Learning for Decision Trees with Provable Guarantees

This paper establishes the first theoretical guarantees for active learning with decision trees: (1) the first explicit analysis of the disagreement coefficient for decision trees, yielding an \(O(\ln^{OPT}(n))\) upper bound; (2) the first active learning algorithm for binary classification achieving a multiplicative error guarantee of \((1+\epsilon)\). Combining these two results yields polylogarithmic label complexity in the dataset size.

Addressing Divergent Representations from Causal Interventions on Neural Networks

This paper systematically demonstrates that causal interventions (activation patching, DAS, SAEs, etc.) push model internal representations off their natural distribution. It theoretically distinguishes "benign shifts" from "harmful shifts," proposes the Counterfactual Latent (CL) loss to constrain intervened representations to remain near the natural manifold, and validates on a 7B LLM that this approach reduces divergence while preserving intervention accuracy.

Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal RL Environment

This paper proposes Agnostics, a language-agnostic post-training pipeline that reformulates programming tasks as I/O behavioral specifications and trains LLMs with a universal verifier and GRPO-based reinforcement learning to generate code in any programming language. The approach enables a Qwen 4B model to match the performance of 16B–70B models on five low-resource languages: Lua, Julia, R, OCaml, and Fortran.

An Efficient, Provably Optimal Algorithm for the 0-1 Loss Linear Classification Problem

This paper proposes the Incremental Cell Enumeration (ICE) algorithm — the first standalone algorithm with rigorous correctness proofs — capable of exactly solving the global optimum of the 0-1 loss linear classification problem in \(O(N^{D+1})\) time, with extensions to polynomial hypersurface classification.

An Information-Theoretic Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes

This paper proposes the information gap, an information-theoretic measure that quantifies the ability of a given experimental design to distinguish between likelihood coding and posterior coding hypotheses. By deriving closed-form expressions for the cross-entropy performance difference between decoders under each hypothesis—shown to be the KL divergence between the true posterior and a task-marginalized surrogate posterior—the framework enables theory-driven optimal experimental design by maximizing this measure over the stimulus prior distribution.

ANO: Faster is Better in Noisy Landscapes

This paper proposes the Ano optimizer, which decouples the update direction from its magnitude — the direction is determined by the sign of the momentum for noise robustness, while the magnitude is determined by the instantaneous gradient absolute value (rather than the momentum magnitude) for responsiveness. Combined with an improved Yogi-style variance estimator, Ano significantly outperforms Adam/Lion/Adan in noisy and non-stationary environments (e.g., RL), while remaining competitive on standard tasks.

AnyUp: Universal Feature Upsampling

AnyUp proposes the first encoder-agnostic learnable feature upsampling method. Through feature-agnostic convolutional layers and window attention mechanisms, it requires only a single training pass to perform high-quality upsampling of arbitrary visual features across arbitrary resolutions, achieving state-of-the-art performance on semantic segmentation, depth estimation, and related tasks.

Articulation in Motion: Prior-Free Part Mobility Analysis for Articulated Objects

This paper proposes AiM (Articulation in Motion), a framework that reconstructs articulated objects from interaction videos and initial-state scans without requiring prior knowledge of the number of parts. It achieves dynamic-static decoupling via a dual-Gaussian representation (Static GS + Deformable GS), combines sequential RANSAC for prior-free part segmentation and joint estimation, and incorporates an SDMD module to handle newly exposed static regions. On complex 6-part objects (Storage), AiM achieves 79.34% mean IoU, substantially outperforming the prior-dependent ArtGS (52.23%).

Bayesian Influence Functions for Hessian-Free Data Attribution

This paper proposes the Local Bayesian Influence Function (BIF), which replaces the intractable Hessian inversion in classical influence functions with a covariance estimate obtained via SGLD sampling, enabling architecture-agnostic data attribution for models with billions of parameters and achieving state-of-the-art performance on retraining experiments.

Beyond Linearity in Attention Projections: The Case for Nonlinear Queries

Motivated by the theoretical finding of algebraic redundancy in \(W_Q\), this work replaces the linear Query projection with a nonlinear residual form \(Q(X)=(X+f_\theta(X))/2\), outperforming a baseline with +12.5% more parameters while keeping parameter count unchanged.

CaDrift: A Time-dependent Causal Generator of Drifting Data Streams

This paper proposes CaDrift, a time-dependent synthetic data stream generation framework based on structural causal models (SCMs). It introduces temporal correlation via EWMA smoothing and autoregressive noise, and realizes controllable distributional drift, covariate drift, severe drift, and local drift by modifying causal mapping functions. CaDrift fills the gap left by existing data stream generators that lack both causal structure and temporal dependence.

cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning

cadrille is the first multi-modal CAD reconstruction model capable of handling point cloud, multi-view image, and text inputs simultaneously. Through a three-stage training paradigm of VLM backbone + SFT + RL fine-tuning, it achieves state-of-the-art performance across 10 CAD reconstruction benchmarks, with RL fine-tuning reducing the invalid rate to near 0%.

Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings

This paper analyzes Instant-NGP's multi-resolution hash encoding (MHE) through the lens of physical systems, deriving a closed-form approximation of its point spread function (PSF). The analysis reveals that the effective resolution is governed by the geometric mean resolution \(N_{\text{avg}}\) rather than the finest resolution \(N_{\max}\), and that axis-aligned grids introduce spatial anisotropy. The paper further proposes Rotated MHE (R-MHE), a zero-overhead method that eliminates anisotropy by applying a distinct rotation to the input coordinates at each hash level.

Chart Deep Research in LVLMs via Parallel Relative Policy Optimization

This paper proposes PRPO (Parallel Relative Policy Optimization), which addresses GRPO's training bottlenecks under multi-dimensional reward interference and heterogeneous data gradient conflicts through two-level parallel decoupled optimization — across reward dimensions and data types. It also introduces MCDR-Bench, which leverages an "error uniqueness principle" to transform subjective generation evaluation into objective error identification, enabling quantitative assessment of chart deep research capabilities.

CHLU: The Causal Hamiltonian Learning Unit as a Symplectic Primitive for Deep Learning

CHLU is a computational learning primitive grounded in relativistic Hamiltonian mechanics and symplectic integration. By enforcing phase-space volume conservation and introducing a causal velocity upper bound, it addresses gradient explosion/vanishing in LSTMs and information dissipation in Neural ODEs, achieving infinite-horizon stability and thermodynamic generative capability.

Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevance Assessment

This paper proposes DREAM — a multi-agent, multi-round debate framework with opposing-stance initialization for IR relevance annotation: cases with consensus are automatically labeled, while disagreements are escalated to human annotators (aided by debate history). DREAM achieves 95.2% balanced accuracy with only 3.5% human escalation. Based on this framework, the BRIDGE benchmark is constructed, uncovering 29,824 missing relevant annotations absent from existing benchmarks (428% of the original annotations), and correcting ranking bias in retrieval systems as well as retrieval-generation performance misalignment in RAG evaluation.

Compositional Diffusion with Guided Search for Long-Horizon Planning

This paper proposes CDGS (Compositional Diffusion with Guided Search), which embeds a population-based search mechanism—iterative resampling combined with likelihood-based pruning—into the diffusion denoising process to address the mode averaging problem arising from the composition of multimodal local distributions. CDGS enables sampling of globally consistent long-horizon plans from short-horizon models without long-horizon training data.

Condition Matters in Full-head 3D GANs

This paper identifies that view conditioning in full-head 3D GANs introduces severe directional bias—generation quality is substantially higher at the conditioned viewpoint than at others. To address this, the authors propose replacing view conditioning with view-invariant semantic features (frontal CLIP features) and introduce BalanceHead360, a synthetic dataset of 11.2 million 360° full-head images generated via Flux.1 Kontext, achieving for the first time high-fidelity, diverse full-head generation with consistent quality across all viewpoints.

Consistent Low-Rank Approximation

This paper formalizes and systematically studies the consistent low-rank approximation problem—maintaining a near-optimal rank-\(k\) approximation of a matrix whose rows arrive in a stream while minimizing the total variation (recourse) of the solution. It proves that \(O(k/\varepsilon \cdot \log(nd))\) recourse is achievable under additive error, \(k^{3/2}/\varepsilon^2 \cdot \text{polylog}\) recourse is achievable under multiplicative \((1+\varepsilon)\) error, and establishes a lower bound of \(\Omega(k/\varepsilon \cdot \log(n/k))\).

Directional Sheaf Hypergraph Networks: Unifying Learning on Directed and Undirected Hypergraphs

This paper proposes Directional Sheaf Hypergraph Networks (DSHN), which combines Cellular Sheaf theory with the directional information of directed hypergraphs to construct a complex-valued Hermitian Laplacian operator. The proposed operator unifies and generalizes existing graph and hypergraph Laplacians, achieving 2%–20% relative accuracy improvements over baselines on 7 real-world datasets.

Distributed Algorithms for Euclidean Clustering

This paper constructs \((1+\varepsilon)\)-coresets for Euclidean \((k,z)\)-clustering in the distributed setting, achieving communication complexity that matches known lower bounds (up to polylogarithmic factors) in both the coordinator model and the blackboard model.

Distributionally Robust Classification for Multi-Source Unsupervised Domain Adaptation

This paper proposes a distributionally robust learning framework that jointly models uncertainty over both the target-domain covariate distribution and the conditional label distribution, achieving significant generalization improvements in UDA settings where target data is extremely scarce or spurious correlations exist in the source domain.

DA-AC: Distributions as Actions — A Unified RL Framework for Diverse Action Spaces

DA-AC proposes treating the parameters of an action distribution (e.g., softmax probabilities or Gaussian mean/variance) as the agent's output "actions," relocating the action sampling process to the environment side. This enables a unified deterministic policy gradient framework for discrete, continuous, and hybrid action spaces. The approach is theoretically proven to achieve strictly lower variance than LR and RP estimators, and attains competitive or state-of-the-art performance across 40+ environments.

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

This paper proposes AIGB-Pearl, which introduces an offline trajectory evaluator and a KL-Lipschitz constrained score maximization scheme for generative auto-bidding. The framework enables generative models to safely surpass the performance ceiling imposed by static offline data under theoretical guarantees, achieving a significant GMV improvement of +3% on Taobao's real-world advertising system.

Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

This paper reveals that systematic growth of curvature along low-loss paths generates entropic barriers, such that even when the energy path is flat, SGD noise confines optimization dynamics to flat regions near minima—resolving the paradox of "mode-connected but dynamically isolated" solutions.

Evaluating GFlowNet from Partial Episodes for Stable and Flexible Policy-Based Training

This paper establishes a theoretical connection between state flow functions and policy value functions in GFlowNet, proposes the Subtrajectory Evaluation Balance (Sub-EB) objective for reliable value function learning, and enhances the stability and flexibility of policy-based GFlowNet training.

Exchangeability of GNN Representations with Applications to Graph Retrieval

This paper identifies that trained GNN node embeddings are exchangeable random variables along the feature dimension (i.e., \(p(\mathbf{X}) = p(\mathbf{X}\pi)\) holds for any dimensional permutation \(\pi\)), and exploits this property to approximate transportation-distance-based (EMD/Wasserstein) graph similarity as Euclidean distance via dimension-wise sorting. A unified locality-sensitive hashing (LSH) framework, GraphHash, is constructed upon this foundation, consistently outperforming baselines including FourierHashNet, DiskANN, IVF, CORGII, and SWWL in AUC on subgraph matching and graph edit distance (GED) retrieval tasks, scaling to corpus sizes of one million graphs.

Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry

This paper reveals a simple product structure on the Cholesky manifold and, building upon it, proposes two fast and numerically stable SPD metrics (PCM and BWCM) with closed-form expressions for all Riemannian operators, achieving simultaneous improvements in accuracy, efficiency, and numerical stability for SPD deep learning.

FastLSQ: Solving PDEs in One Shot via Fourier Features with Exact Analytical Derivatives

By exploiting the cyclic closed-form derivative structure of sinusoidal basis functions, this work presents a one-shot PDE solver that requires neither automatic differentiation nor iterative training. It achieves \(10^{-7}\) accuracy in 0.07s for linear PDEs and \(10^{-8}\)\(10^{-9}\) accuracy in under 9s for nonlinear PDEs, outperforming PINNs by thousands of times in speed and several orders of magnitude in accuracy.

Federated ADMM from Bayesian Duality

This paper derives a Bayesian dual structure for ADMM from a variational Bayes (VB) perspective, proving that classical ADMM is a special case of VB over isotropic Gaussian families. Two novel extensions are introduced: a Newton-like variant (one-round convergence on quadratic objectives) and an Adam-like variant (IVON-ADMM, achieving +7% accuracy in heterogeneous deep learning settings).

FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability-Plasticity Tradeoff

This paper formalizes the stability-plasticity tradeoff in continual learning as a constrained optimization problem—minimizing weight deviation (stability) subject to an orthogonality constraint (plasticity)—yielding a closed-form solution to the orthogonal Procrustes problem, \(\tilde{W}^* = W(W^\top W)^{-1/2}\) (polar decomposition), implemented efficiently via Newton-Schulz iteration (<1% additional time). FIRE comprehensively outperforms baselines such as S&P across visual continual learning, LLM continual pre-training, and RL.

From Movement to Cognitive Maps: RNNs Reveal How Locomotor Development Shapes Hippocampal Spatial Coding

By combining cluster analysis of infant rodent locomotor development with a shallow RNN predictive learning model, this work provides the first computational demonstration that developmental changes in movement statistics (crawling → walking → running → adult) drive the sequential emergence of spatially tuned hippocampal neurons (place cells, head direction cells, and conjunctive coding cells). The model quantitatively reproduces the developmental timeline observed in rat hippocampal recordings and predicts a progressive increase in conjunctive place-HD coding cells during development — a prediction subsequently validated in experimental data.

Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion

This paper extends manifold theory from image to tabular diffusion models, proving that the gradient of any differentiable inference-time loss lies in the tangent space of the data manifold (beyond the square-error loss restriction). Based on this result, the proposed Harpoon method guides unconditional samples at inference time along the manifold to satisfy diverse tabular constraints.

HEEGNet: Hyperbolic Embeddings for EEG

This work presents the first systematic empirical validation that EEG data exhibits hyperbolicity (hierarchical structure), and proposes HEEGNet, a hybrid hyperbolic network architecture. The model combines a Euclidean encoder for spatiotemporal-spectral feature extraction with a hyperbolic encoder for capturing hierarchical relationships, augmented by a novel coarse-to-fine domain adaptation strategy (DSMDBN). HEEGNet achieves state-of-the-art performance across multiple cross-domain tasks spanning visual evoked potentials, emotion recognition, and intracranial EEG.

Hilbert-Guided Sparse Local Attention

By reordering 2D image tokens into a 1D sequence via Hilbert space-filling curves—which preserve spatial locality—this work substantially increases the empty-block ratio in local attention (from 87.5% to 96.9%), enabling 4× speedup for window attention and 18× for sliding-window attention via FlexAttention, with negligible accuracy loss.

Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

This paper provides the first proof that mini-batch Adam exhibits a different implicit bias from its full-batch counterpart: a constructed dataset causes per-sample Adam to converge to an \(\ell_2\) maximum-margin classifier (whereas full-batch Adam converges to \(\ell_\infty\)), and a proxy algorithm, AdamProxy, is introduced to characterize data-adaptive Mahalanobis-norm margin maximization on general datasets.

In-Context Algebra

This paper introduces an in-context algebra task—where tokens serve as pure variables and each sequence randomly reassigns their meanings—and finds that Transformers in this setting no longer learn classical Fourier/geometric representations. Instead, three symbolic reasoning mechanisms emerge (commutative copying, identity element recognition, and closure-based cancellation), with these capabilities appearing sequentially as phase transitions during training.

Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch RL

This paper proposes the Jackpot framework, which applies Optimal Budget Rejection Sampling (OBRS) to accept or reject rollout tokens at the token level within a controllable acceptance budget, and reweights the remaining samples. The method is theoretically proven to strictly reduce the KL divergence between the actor and policy under any budget. Combined with joint training and distillation of the rollout model, Jackpot enables a small model (e.g., Qwen3-1.7B) to serve as the rollout model for training a large model (e.g., Qwen3-8B), achieving performance close to the on-policy baseline.

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, and Value Weight Triplet in Self-Attention

This paper theoretically demonstrates that the Query/Key/Value weight triplet in Transformer self-attention is redundant — the Query weight matrix can be replaced by an identity matrix (reducing attention parameters by 25%). GPT-style models trained from scratch confirm that performance is preserved under appropriate hyperparameter adjustment, and training remains stable at 3× lower weight decay, suggesting implicit regularization.

Latent Equivariant Operators for Robust Object Recognition: Promises and Challenges

This paper proposes learning or predefining equivariant shift operators in latent space to handle group transformations such as rotation and translation. At inference time, transformation parameters are estimated via KNN search, and inputs are mapped back to a canonical pose before classification. Experiments on MNIST demonstrate successful extrapolation to out-of-training-range transformations, offering greater flexibility than standard networks and equivariant networks, though scaling to more complex datasets remains an open challenge.

Latent Fourier Transform

This paper proposes LatentFT, a framework that applies the Discrete Fourier Transform (DFT) to latent time-series representations produced by a diffusion autoencoder, decomposing musical patterns by timescale. During training, a correlated log-scale frequency mask is randomly applied so that the decoder learns to reconstruct audio from partial spectra. At inference time, users specify frequency masks to selectively preserve or blend musical elements across different timescales. LatentFT consistently outperforms baselines including ILVR, Guidance, Codec Filtering, and RAVE on conditional generation and music blending tasks, with its superior audio quality and blending capability statistically confirmed by a listening test involving 29 musicians.

LPWM: Latent Particle World Models for Object-Centric Stochastic Dynamics

LPWM is the first self-supervised object-centric world model that scales to real-world multi-object datasets. Its core innovation is learning independent per-particle latent action distributions (\(z_c^m\)) for each particle, encoding all frames in parallel via a causal spatiotemporal Transformer, supporting diverse conditioning signals (actions, language, image goals, multi-view), achieving state-of-the-art video prediction, and demonstrating imitation learning capability (89% success rate on OGBench task3).

Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation

ADAlign is proposed as a framework that leverages neural characteristic functions to adaptively align source/target graph distributions in the spectral domain — eliminating the need for manual selection of alignment criteria by automatically identifying the most prominent distributional discrepancies in each transfer scenario. It achieves state-of-the-art performance across 16 transfer tasks on 10 datasets while reducing memory consumption and training time.

Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks

Using tools from algebraic geometry, this paper systematically analyzes MLPs and CNNs with polynomial activations: it proves finite identifiability for MLPs and unique identifiability for CNNs, reveals that sparse subnetworks correspond to singular points of the neuromanifold, and provides a geometric explanation of the sparsity bias in MLPs via the notion of "critical exposure"—a property that CNNs do not possess.

Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation

This paper proposes DiffGDA—the first method to introduce diffusion models into graph domain adaptation (GDA). It formulates the continuous-time joint structure-semantic evolution from source graphs to target graphs using stochastic differential equations (SDEs), and employs a density-ratio-based domain-aware guidance network to steer the diffusion trajectory toward the target domain. Theoretical convergence to the optimal adaptation path is proven, and DiffGDA comprehensively outperforms state-of-the-art methods across 14 transfer tasks on 8 real-world datasets.

LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models

This paper proposes LipNeXt—the first unconstrained, convolution-free 1-Lipschitz architecture—which learns orthogonal matrices via manifold optimization and achieves spatial mixing through a theoretically motivated Spatial Shift Module derived from Theorem 1. LipNeXt scales to billion-parameter models and establishes new state-of-the-art certified robust accuracy (CRA) on CIFAR-10/100, Tiny-ImageNet, and ImageNet, with a +8% CRA gain on ImageNet at \(\varepsilon=1\).

Lipschitz Bandits with Stochastic Delayed Feedback

This paper provides the first systematic study of Lipschitz bandits over continuous arm spaces under stochastic delayed feedback. For bounded delays, it proposes the Delayed Zooming algorithm, which employs a lazy update mechanism to maintain the suboptimality gap bound \(\Delta(x) \leq 6r_t(x)\). For unbounded delays, it proposes DLPP, a phased pruning strategy whose regret is tied to the delay quantile \(Q(p)\). Instance-dependent lower bounds are established to prove that DLPP is nearly optimal.

Missing Mass for Differentially Private Domain Discovery

This paper revisits the differentially private domain discovery problem through the lens of missing mass, providing the first near-optimal \(\ell_1\) missing mass upper bounds for the simple and scalable Weighted Gaussian Mechanism (WGM) on Zipfian data, as well as distribution-free \(\ell_\infty\) missing mass guarantees. WGM is further applied as a domain discovery preprocessing step for private top-\(k\) and \(k\)-hitting set problems over unknown domains, with theoretical results validated on six real-world datasets.

Neural Force Field: Few-shot Learning of Generalized Physical Reasoning

This paper proposes Neural Force Field (NFF), which models object interactions as continuous force fields. A neural operator learns the force field function, and an ODE integrator decodes trajectories from it. NFF achieves few-shot state-of-the-art on three benchmarks—I-PHYRE (100 trajectories), N-body (200 trajectories), and PHYRE (0.012M samples, 267× fewer than prior SOTA)—reducing cross-scenario RMSE by 32–64% and achieving near-human performance on planning tasks.

Neuro-Symbolic Decoding of Neural Activity

This paper proposes NEURONA, a neuro-symbolic framework for fMRI decoding and concept grounding. By decomposing visual scenes into symbolic programs (logical combinations of concepts), NEURONA substantially outperforms both end-to-end neural decoders and linear models on fMRI question-answering tasks.

Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning

This paper proposes NcPU, a non-contrastive PU learning framework that applies a sqrt transformation to the standard non-contrastive loss (NoiSNCL) so that clean-pair gradients dominate training, and introduces PhantomGate to provide conservative negative supervision with a regret rollback mechanism. Both modules iterate in a mutually beneficial manner under an EM framework. Without relying on auxiliary negative samples or pre-estimated class priors, NcPU narrows the gap with supervised learning from 14.26% to <1.4% on CIFAR-100, and achieves SOTA on xBD disaster damage assessment as well.

On the Impact of the Utility in Semivalue-based Data Valuation

This paper introduces a geometric representation termed spatial signature to unify the modeling of utility selection in data valuation as a directional rotation problem on the unit circle. It further proposes a robustness metric \(R_p\) and demonstrates that the Banzhaf value exhibits the highest ranking stability across different utility functions.

On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets

This paper systematically investigates the Lipschitz continuity of three commonly used set aggregation functions (sum, mean, max) and attention mechanisms under three multiset distance functions, derives upper bounds on the Lipschitz constants of set neural networks, and connects these results to perturbation stability and generalization under distribution shift.

Out of the Shadows: Exploring a Latent Space for Neural Network Verification

By interpreting zonotopes as "shadows" (projections) of high-dimensional hypercubes, this paper identifies that the input set and output enclosure share a common latent space. Building on this insight, it proposes a specification-driven input refinement method that back-propagates unsafe output constraints into the input space to prune subproblems, reducing branch-and-bound subproblem counts by 60–65%. All operations are matrix-based, enabling efficient GPU acceleration. The method achieves competitive performance with top-tier tools such as α-β-CROWN across eight VNN-COMP'24 benchmarks.

Oversmoothing, Oversquashing, Heterophily, Long-Range, and More: Demystifying Common Beliefs in Graph Machine Learning

This paper systematically examines nine common beliefs in graph machine learning concerning oversmoothing, oversquashing, homophily/heterophily, and long-range dependencies. Through concise counterexamples, each belief is refuted. Notably, "oversquashing" is decomposed into two independent concepts—computational bottleneck and topological bottleneck—thereby clarifying widespread conceptual confusion in the field.

OwlEye: Zero-Shot Learner for Cross-Domain Graph Data Anomaly Detection

This paper proposes OwlEye, a framework that aligns heterogeneous graph embeddings into a shared space via pairwise-distance-statistics-based cross-domain feature alignment, extracts attribute-level and structure-level normal patterns from multiple graphs into an extensible dictionary, and detects anomalous nodes in unseen graphs under fully zero-shot conditions through a truncated attention-based reconstruction mechanism. OwlEye achieves an average AUPRC of 36.17% across 8 datasets, surpassing the strongest baseline ARC by approximately 5.4 percentage points.

Predicting Kernel Regression Learning Curves from Only Raw Data Statistics

This paper proposes the Hermite Eigenstructure Ansatz (HEA), which analytically predicts the learning curves (test error vs. sample size) of rotation-invariant kernels on real image datasets (CIFAR-5m, SVHN, ImageNet) using only two statistics: the data covariance matrix and the Hermite decomposition of the target function. The paper proves that HEA holds for Gaussian data and empirically demonstrates that MLPs in the feature-learning regime also learn Hermite polynomials in the order predicted by HEA.

Probabilistic Kernel Function for Fast Angle Testing

This paper studies the angle testing problem in high-dimensional Euclidean space and proposes two deterministic probabilistic kernel functions, \(K_S^1\) and \(K_S^2\), based on reference angles for angle comparison and angle threshold judgment, respectively. Theoretical guarantees are obtained without relying on asymptotic assumptions of Gaussian distributions. Applied to approximate nearest neighbor search (ANNS), the method achieves 2.5×–3× QPS speedup on HNSW graphs.

Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields

This paper proposes the Decoupled Representation Refinement (DRR) paradigm, which employs a deep refiner network to offline-refine the embedding structure and cache the results, so that the inference stage requires only fast interpolation and a lightweight decoder. On ensemble simulation surrogate modeling tasks, DRR-Net achieves state-of-the-art reconstruction accuracy at less than 1/27 of the inference cost.

Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

This paper proposes a new intuitive interpretation of SAM's underlying mechanism — that the gradient at the perturbed point approximates the direction toward the local maximum — and reveals its imprecision as well as the multi-step degradation problem. It then introduces XSAM, which achieves more faithful and effective sharpness-aware minimization by explicitly estimating the direction of the maximum.

Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees

This paper proposes Random Wavelet Features (RWF), a scalable non-stationary kernel approximation framework constructed by randomly sampling from a family of wavelets. RWF preserves the linear-time complexity of random feature methods while offering guarantees of positive definiteness, unbiasedness, and uniform convergence.

SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding

This paper proposes SEED (Semantic Evaluation for Visual Brain Decoding), a composite evaluation metric combining three complementary measures — Object F1, Cap-Sim, and EffNet — which substantially outperforms all existing metrics in alignment with human evaluation.

Speculative Actions: A Lossless Framework for Faster AI Agents

Inspired by CPU speculative execution and LLM speculative decoding, this paper proposes the Speculative Actions framework: while a slow Actor (large model) computes, a fast Speculator (small model) predicts future actions and pre-executes them; upon a match, the waiting round is skipped, achieving lossless acceleration. The framework achieves 15–30% latency reduction across Chess, e-commerce, and QA scenarios. A confidence-based dynamic branching strategy attains acceleration comparable to three speculative branches while using 40% fewer tokens.

t-SNE Exaggerates Clusters, Provably

This paper provides rigorous theoretical proofs of two fundamental failure modes of t-SNE: (1) the strength of input clusters cannot be inferred from the output, and (2) extreme outliers cannot be faithfully represented — even when the input has no cluster structure or contains extreme outliers, t-SNE may produce perfectly clustered visualizations.

The Counting Power of Transformers

This paper proves that Transformers can express not only (semi-)linear counting properties but all semi-algebraic counting properties (i.e., Boolean combinations of multivariate polynomial inequalities), generalizing all prior results on the counting power of Transformers and deriving novel undecidability conclusions.

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

This paper decomposes AI model errors into bias (systematic misalignment) and variance (incoherent behavior), finding that longer reasoning leads to greater incoherence, and that larger models become more incoherent on difficult tasks. This suggests that future superintelligent AI is more likely to exhibit unpredictable, "industrial accident"-style failures than to coherently pursue wrong objectives.

The Invisibility Hypothesis: Promises of AGI and the Future of the Global South

This paper introduces the Invisibility Hypothesis, arguing that as AI systems increasingly serve as the coordination layer for economic and political allocation, they will systematically favor "machine-readable" individuals. Informal workers in the Global South, lacking digital verifiability, face managed exclusion. The central risk shifts from job displacement to relevance loss, and this exclusion is self-reinforcing.

The Price of Robustness: Stable Classifiers Need Overparameterization

This paper establishes stability-generalization bounds for discontinuous classifiers and proves a "law of robustness" for classification: any interpolating classifier with \(p \approx n\) parameters is necessarily unstable, and achieving high stability requires overparameterization on the order of \(p \approx nd\).

ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization

ToProVAR is a framework that employs attention entropy to uniformly analyze sparsity across three dimensions — token, layer, and scale — in VAR models, achieving up to 3.4× speedup with negligible image quality degradation, significantly outperforming FastVAR and SkipVAR.

Towards Sustainable Investment Policies Informed by Opponent Shaping

This paper formally proves the conditions under which the InvestESG simulation environment constitutes a social dilemma, and applies the Advantage Alignment opponent shaping algorithm to guide economic agents toward sustainable investment equilibria.

Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition

This paper proposes DeepEISNN, a normalization-free learning framework based on cortical excitatory-inhibitory (E-I) circuits. Through two techniques—E-I Init and E-I Prop—it achieves stable end-to-end training of deep SNNs while balancing performance and biological plausibility.

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

CALIPER proposes a detector- and model-agnostic, data-only test that estimates the minimum amount of post-drift data required for safe retraining after abrupt concept drift. It tracks the monotonic non-increasing trend of a surrogate error from weighted local regression (WLR) as the locality parameter \(\theta\) increases, combined with an effective sample size (ESS) gate, without requiring any actual retraining of the downstream model.