Skip to content

📂 Others

🔬 ICLR2026 · 116 paper notes

📌 Same area in other venues: 📷 CVPR2026 (98) · 💬 ACL2026 (4) · 🧪 ICML2026 (70) · 🤖 AAAI2026 (117) · 🧠 NeurIPS2025 (118) · 📹 ICCV2025 (33)

🔥 Top topics: Adversarial Robustness ×11 · Alignment/RLHF ×6 · Federated Learning ×4 · Continual Learning ×4 · Domain Adaptation ×4

A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks

This paper reintroduces "activity-dependent membrane conductance" from biological neurons into the LIF model to construct a Spiking Gated Neuron (DGN) that adaptively gates information flow. It theoretically proves its superior noise suppression capabilities and experimentally demonstrates high accuracy and noise resistance on speech/neuromorphic temporal tasks.

A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components

The proposed FedGEM algorithm constructs uncertainty sets after local EM steps on clients, allowing the server to detect cluster overlaps via set intersections and infer the global cluster count. This marks the first federated clustering approach that operates without a predefined number of clusters while providing probabilistic convergence guarantees.

a representer theorem for hawkes processes via penalized least squares minimizat

A new representer theorem is established for estimating triggering kernels in linear multivariate Hawkes processes within an RKHS framework. It proves that the optimal estimator is represented as a linear combination of equivalent kernels at data points with dual coefficients analytically equal to 1, eliminating the need for dual optimization and enabling scalable non-parametric estimation.

A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction

CopulaGNN is extended from the node level to the edge level by constructing the correlation matrix as a Gramian matrix of edge embeddings and utilizing the Woodbury identity to reconstruct conditional probability distributions. This approach achieves scalable modeling of statistical dependencies between edges for link sign prediction tasks in signed graphs.

A Single Architecture for Representing Invariance Under Any Space Group

Designed a single architecture (Crystal Fourier Transformer) adaptable to any space group invariance. It constructs symmetry-adapted Fourier bases by analytically deriving constraints on Fourier coefficients from group operations, achieving parameter sharing and zero-shot generalization across 230 space groups via a dual graph representation of constraints.

A Study on PAVE Specification for Learnware

Addressing the challenge of identifying useful models from a massive repository without accessing training data in the "Learnware = Model + Specification" paradigm, this paper systematically investigates the PArameter VEctor specification (PAVE). By encoding model capabilities and task requirements via parameter updates induced by fine-tuning, the authors prove its homology with the classic RKME specification from an NTK perspective. Leveraging LoRA-style low-rank approximation, storage and computation are compressed to under 1% of the original model parameters. Identified learnwares can outperform user-fine-tuned pre-trained models in few-shot scenarios.

Accelerated Parallel Tempering via Neural Transports

The rigid "direct state swap" in Parallel Tempering (PT) is replaced with an "accelerated swap": neural transports (Normalizing Flows / Controlled Diffusion / Diffusion Models) are used to push the two states towards each other before performing a Metropolis acceptance check. This enables high-probability exchanges even when adjacent annealed distributions have minimal overlap, significantly increasing the round-trip count between the reference and target distributions while maintaining the asymptotic unbiasedness of MCMC and providing low-variance free energy estimates.

Active Learning for Decision Trees with Provable Guarantees

Provides the first theoretical guarantees for active learning of decision trees: (1) Conducts the first analysis of the disagreement coefficient for decision trees and derives an \(O(\ln^{OPT}(n))\) upper bound; (2) Proposes the first binary active learning algorithm achieving a \((1+\epsilon)\) multiplicative error guarantee; combining these results achieves polylogarithmic label complexity relative to the dataset size.

Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks

This paper proposes adaptive canonicalization: instead of the input alone determining a canonical pose, the input and the current task network jointly select the transformation with the highest confidence. This maintains symmetry invariance while alleviating discretization issues in traditional canonicalization. It achieves results superior to equivariant architectures, data augmentation, and fixed canonicalization in spectral graph networks, molecular/protein graph classification, and rotated point cloud classification.

Adaptive Conformal Guidance for Learning under Uncertainty

The paper embeds split conformal prediction (split CP) directly into the training loop, using the "prediction set size" to quantify the uncertainty of guidance signals (teacher soft labels / pseudo-labels / expert policies), and then adaptively downweights unreliable guidance—a unified framework covering supervised, semi-supervised, and imitation-guided RL.

Aligning Collaborative View Recovery and Tensorial Subspace Learning via Latent Representation for Incomplete Multi-View Clustering

ARSL-IMVC utilizes a shared latent representation \(H\) as a "bridge" to explicitly align missing view recovery (CVR) and tensorial subspace learning (TSL) within a unified framework for mutual promotion, thereby achieving more robust multi-view clustering in scenarios with significantly missing views.

An Information-Theoretic Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes

Ours proposes information gap, an information-theoretic metric that quantitatively evaluates the ability of a given experimental design to distinguish between two probabilistic neural coding hypotheses. This is achieved by deriving analytical expressions for the cross-entropy performance difference of decoders under likelihood and posterior coding hypotheses (essentially the KL divergence between the true posterior and a task-marginalized proxy posterior) and optimizing stimulus prior distributions by maximizing this metric to achieve theory-driven optimal experimental design.

Any-Subgroup Equivariant Networks via Symmetry Breaking

This paper proposes ASEN (Any-Subgroup Equivariant Network), which utilizes an equivariant backbone network for a large group combined with a "breaking input" whose automorphism group exactly matches the target subgroup. This allows a single network to become equivariant to any permutation subgroup by simply switching the auxiliary input. Utilizing the 2-closure for an efficient approximation algorithm, the model outperforms discrete equivariant models and non-equivariant baselines in symmetry selection for graphs and images, as well as in sequential multi-task and transfer learning.

AnyUp: Universal Feature Upsampling

AnyUp proposes the first encoder-agnostic learnable feature upsampling method. By employing feature-agnostic convolutional layers and a window attention mechanism, it can perform high-quality upsampling for arbitrary visual features across any resolution with only a single training session. It achieves SOTA performance on tasks such as semantic segmentation and depth estimation.

Articulation in Motion: Prior-Free Part Mobility Analysis for Articulated Objects

The Articulation in Motion (AiM) framework is proposed to reconstruct articulated objects from interaction videos and initial state scans without requiring part-number priors. It achieves motion-static decoupling using a dual Gaussian representation (static GS + deformable GS), utilizes sequential RANSAC for prior-free part segmentation and joint estimation, and incorporates an SDMD module to handle newly exposed static regions. On complex 6-part objects (Storage), AiM significantly outperforms the prior-dependent ArtGS (52.23% mean IoU) with a performance of 79.34%.

Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding

This paper proposes SID, a semantic intent decoding framework that redefines "brain signal-to-language" as a process of first deconstructing EEG/SEEG into a set of unordered semantic units, then retrieving them in a continuous semantic space, and finally reconstructing sentences using an LLM. Its implementation, BrainMosaic, significantly outperforms classification-based and end-to-end generative baselines on multilingual EEG and clinical SEEG data across concept-level and sentence-level metrics.

AtC: Aggregate-then-Calibrate for Human-centered Assessment

AtC proposes a two-stage "Aggregate-then-Calibrate" framework: first, it aggregates human pairwise comparisons into a consensus ranking using a heterogeneous Thurstone model that accounts for annotator reliability; then, it aligns scores from any predictive model to this ranking via isotonic projection. This approach simultaneously achieves "reliable human-provided ordering" and "consistent model-provided scale" in the absence of verifiable ground truth.

Bayesian Influence Functions for Hessian-Free Data Attribution

Propose the Local Bayesian Influence Function (BIF), which replaces the infeasible Hessian inverse in classical influence functions with covariance estimated via SGLD sampling. This achieves architecture-agnostic data attribution for models with billions of parameters and reaches SOTA in retraining experiments.

Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings

RANKREFINE++ fuses "regressor predictions" and "expert pairwise rankings" into a strictly log-concave posterior via Bayesian inference. It addresses scale mismatch and curvature dominance in the Bradley-Terry model under large reference sets using temperature calibration and accuracy gating, significantly improving prediction accuracy without retraining the regressor.

Beyond Linear Processing: Dendritic Bilinear Integration in Spiking Neural Networks

This paper introduces a biologically inspired "bilinear dendritic integration" term to the commonly used LIF neuron in Spiking Neural Networks (SNNs). In addition to the linear summation of synaptic inputs, it incorporates an interaction term \(s^T K s\) between pairs of inputs, enabling a single neuron to perform non-linear computations like XOR. Theoretically, it is proven to exploit and propagate input correlation structures across layers. Experimentally, it consistently outperforms LIF and various enhanced neurons across ResNet, VGG, and Transformer architectures on both static and neuromorphic datasets, improving average accuracy from 83.95% to 85.18% with only an approximate 3% increase in energy consumption.

Beyond Uniformity: Regularizing Implicit Neural Representations through a Lipschitz Lens

This work reframes Lipschitz regularization for INRs from a "rigid uniform 1-Lipschitz constraint" into a framework of "estimable, non-uniformly distributable Lipschitz budgets." By deriving a global budget \(K\) from task priors and intelligently allocating it across layers, the method achieves a superior balance between smoothness and expressivity.

Breaking Gradient Temporal Collinearity for Robust Spiking Neural Networks

Addressing the poor robustness of direct encoding Spiking Neural Networks (SNNs), this paper proposes "Gradient Temporal Collinearity" (GTC) as a quantifiable metric to explain why they are less resilient than rate encoding. The authors design STOD—inserting parameterized orthogonal kernels at the input layer for each timestep combined with global orthogonal regularization—to structurally decorrelate gradient directions across timesteps. This achieves significantly higher accuracy under FGSM and PGD attacks on CIFAR/ImageNet/DVS compared to existing SOTA, with nearly zero extra inference overhead.

Buckingham \(\pi\)-Invariant Test-Time Projection for Robust PDE Surrogate Modeling

Utilizing the Buckingham π theorem, this work identifies "OOD shifts caused by different units/scales" as physically equivalent scaling transformations. It proposes a training-free, model-agnostic test-time projection: translating test samples along \(\pi\)-preserving equivalence classes in log space to the nearest training class. This approach reduces the MAE of surrogate models such as FNO/U-Net by up to 91% under extreme OOD conditions.

Building Spatial World Models from Sparse Transitional Episodic Memories

The paper proposes the Episodic Spatial World Model (ESWM), which builds spatial world models from sparse, disconnected episodic memories (one-step transitions). Its latent space spontaneously develops cognitive maps aligned with environmental topology and supports zero-shot exploration and navigation.

Change Point Localization and Inference in Dynamic Multilayer Networks

Addressing the Dynamic Multilayer Random Dot Product Graph (D-MRDPG) where shared latent positions are fixed but connection weights of layers change abruptly over time, this paper proposes a two-stage offline change point localization algorithm combining "Seeded Binary Segmentation + Low-Rank Tensor Refinement." It provides the first consistency guarantee for the number and locations of change points, establishes the limiting distribution of the refined estimator, and constructs data-driven confidence intervals.

Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings

Analyzes the Multi-Resolution Hash Encoding (MHE) of Instant-NGP from a physical system perspective, deriving a closed-form approximation of its Point Spread Function (PSF). It reveals that the effective resolution is determined by the average resolution \(N_{\text{avg}}\) rather than the finest resolution \(N_{\max}\), identifies grid-induced anisotropy, and proposes a zero-overhead Rotated MHE (R-MHE) to eliminate anisotropy by rotating input coordinates per layer.

CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design

CircuitNet 3.0 is the first large-scale open-source AI4EDA benchmark for timing and power prediction. It leverages 8,659 validated open-source RTL designs, augmented via Verilog AST mutations and task-oriented filtering into 15,863 multi-modal instances across Register-Transfer Level (RTL), Netlist, and Layout stages.

Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning

ConCM reformulates the core dilemma of Few-Shot Class Incremental Learning (FSCIL) as a "feature-structure dual consistency" problem. It first corrects few-shot prototype shifts using Memory-aware Prototype Calibration (MPC) inspired by hippocampal associative memory, and then solves for an evolvable embedding structure that simultaneously satisfies geometric optimality and maximum matching via Dynamic Structure Matching (DSM) in each incremental session. This approach achieves SOTA harmonic mean performance on mini-ImageNet, CIFAR100, and CUB200.

Consistent Low-Rank Approximation

The paper proposes and systematically studies the "Consistent Low-Rank Approximation" problem—maintaining a near-optimal rank-\(k\) approximation of a matrix whose rows arrive in a stream while minimizing the total change in the solution (recourse). It proves that \(O(k/\varepsilon \cdot \log(nd))\) recourse is feasible under additive error, \(k^{3/2}/\varepsilon^2 \cdot \text{polylog}\) recourse is feasible under \((1+\varepsilon)\) multiplicative error, and provides a lower bound of \(\Omega(k/\varepsilon \cdot \log(n/k))\).

Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach

HyperFedZero utilizes a hypernetwork conditioned on "distribution embeddings" to dynamically generate classifier parameters for new, non-participating clients with intra-domain distribution shifts. It achieves localized personalization with zero fine-tuning and minimal overhead, consistently outperforming existing methods across 7 datasets and 5 models.

Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds

This paper analyzes the spectral properties of the Fisher Information Matrix (FIM) in the kernel space of low-dimensional probability distributions to establish deterministic upper and lower bounds for the metric tensor on the parameter space (neuromanifold). Based on the Hutchinson trace estimator, it introduces a family of unbiased stochastic estimators with bounded variance that require only a single backpropagation for efficient computation.

Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces

Ours proposes Discount Model Search (DMS), which utilizes neural networks to fit continuous smooth discount functions instead of the histogram-based discrete representations in CMA-MAE. This addresses the search stagnation caused by distortion in high-dimensional measure spaces and introduces the QDDM paradigm for defining measure spaces directly via image datasets.

Distributed Algorithms for Euclidean Clustering

Constructs \((1+\varepsilon)\)-coresets for Euclidean \((k,z)\)-clustering in distributed environments, achieving optimal communication complexity lower bounds (up to polylog factors) in both the coordinator and blackboard models.

Distributionally Robust Classification for Multi-Source Unsupervised Domain Adaptation

A distributionally robust learning framework is proposed to significantly enhance generalization performance in UDA scenarios with extreme target data scarcity or spurious correlations in the source domain by jointly modeling the uncertainty of target covariate distributions and conditional label distributions.

DA-AC: Distributions as Actions — A Unified RL Framework for Diverse Action Spaces

DA-AC proposes treating the parameters of action distributions (such as softmax probabilities or Gaussian mean/variance) as the "actions" output by the agent, moving the action sampling process into the environment. This allows a unified deterministic policy gradient framework to handle discrete, continuous, and hybrid action spaces. The method theoretically guarantees strictly lower variance than LR and RP estimators and achieves competitive or SOTA performance across 40+ environments.

Do We Really Need Permutations? Impact of Model Width on Linear Mode Connectivity

Empirical evidence suggests that Linear Mode Connectivity (LMC) between independently trained models can be achieved solely by increasing model width without parameter permutations. The study proposes "Layer-wise Exponentially Weighted Connectivity" (LEWC) to explain the underlying mechanism.

Energy-Efficient Random Variate Generation via Compressed Lookup Tables

This paper proposes the cLUT (compressed lookup table) method: a "geometric frequency" scheme for lossless lookup table compression that compresses a naive table into a "tall and narrow" 2D array. Combined with a sampling step requiring only a few random bits and a single memory lookup, it achieves exact sampling for any finite discrete distribution, with speeds 10–100\(\times\) faster than mainstream Python samplers and 25–50% lower energy consumption than SOTA C implementations.

Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning

ETAP combines white-box gradient affinity analysis with data-driven ensemble prediction. Using a minimal number of training groups, it accurately predicts performance gains in multi-task learning (MTL), enabling efficient task partitioning into optimal groups.

Evaluating GFlowNet from Partial Episodes for Stable and Flexible Policy-Based Training

Establish the theoretical connection between the state flow function and the policy evaluation function in GFlowNet, and propose the Subtrajectory Evaluation Balance (Sub-EB) objective for reliable learning of the evaluation function, enhancing the stability and flexibility of policy-based GFlowNet training.

Exploring State-Space Models for Data-Specific Neural Representations

This paper introduces State-Space Models (SSMs) to "data-specific neural representations" for the first time (overfitting a compact network to a single image/video/3D instance). It theoretically demonstrates that the hidden states of SSMs inherently encode the input signal itself and proposes the Structured State-Space Kernel (S3K). By distilling SSMs into convolutional kernels to support multi-dimensional inputs and downsampling, the method outperforms existing approaches in image, video, and 3D reconstruction.

Exposing Mixture and Annotating Confusion for Active Universal Test-Time Adaptation

This paper proposes a new paradigm of Active Universal Test-Time Adaptation (AUTTA) and introduces the EMAC method to incorporate sparse human annotations during testing. It decouples domain and class shifts using SVD + GMM to expose samples in the "mixed region," employs a reward-driven strategy to select representative samples for annotation, and utilizes a clustering contrastive loss to balance annotations and pseudo-labels, achieving SOTA performance under dual shifts.

Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry

This work reveals a simple product structure on Cholesky manifolds, leading to two fast and numerically stable SPD metrics (PCM and BWCM). All Riemannian operators possess closed-form expressions, achieving triple improvements in performance, efficiency, and stability for SPD deep learning.

Federated ADMM from Bayesian Duality

The Bayesian duality structure of ADMM is derived from a Variational Bayes (VB) perspective, proving that classic ADMM is a special case of VB on the isotropic Gaussian family. Two new extensions are derived: Newton-like (one-round convergence for quadratic objectives) and Adam-like (+7% accuracy in deep heterogeneous scenarios).

Forget Forgetting: Continual Learning in a World of Abundant Memory

When storage is cheap and the GPU is the bottleneck, the primary challenge of continual learning flips from "preventing forgetting" to "preserving plasticity." This paper proposes a lightweight weight-space method (ranked parameter reset + in-training weight averaging) to recover both stability and plasticity at a cost comparable to naive Replay.

Fractional-Order Spiking Neural Network

This work replaces the first-order ODEs underlying the membrane potential evolution of spiking neurons with Caputo fractional-order ODEs. This endows neurons with an inherent "long memory" characterized by power-law decay, strictly generalizing the classical IF/LIF models (which recover the original models at \(\alpha=1\)). The approach achieves higher accuracy and stronger noise robustness in both neuromorphic vision and graph learning tasks.

From atom to space: SpatialRead, a Regionalized Readout Function for Spatial Properties of Materials

Focusing on material properties like gas adsorption that "decompose by spatial region rather than by atom," this paper proposes SpatialRead. It inserts "spatial nodes" into voxelized space, transforms the atomic graph into an atom-spatial heterogeneous graph, and adaptively fuses atomic and spatial inductive biases using multi-modal attention. This allows small models trained from scratch to surpass foundation models pre-trained on 120 million samples for these tasks.

From Fields to Random Trees

This paper proposes the SPT method: by uniformly sampling random spanning trees from the underlying graph of a Markov Random Field (MRF) to break loops, the originally NP-hard MAP inference is decomposed into a series of exactly solvable subproblems on trees. After correcting edge weights using effective resistance and merging results, SPT significantly outperforms LBP / TRBP on sparse and locally connected graphs.

From Movement to Cognitive Maps: RNNs Reveal How Locomotor Development Shapes Hippocampal Spatial Coding

Combining cluster analysis of rat pup locomotor development with shallow RNN predictive learning models, this paper provides the first computational evidence that developmental changes in locomotor statistics (crawling → walking → running → adult) drive the sequential emergence of hippocampal spatial-tuned neurons (place cells, head direction cells, and conjunctive cells). The model quantitatively recovers the developmental timeline of rat hippocampal recordings and predicts a gradual increase in conjunctive place-direction cells during development, which was subsequently verified in experimental data.

Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models

Targeting low-data deployment scenarios where "only dozens of real samples are available at launch," this paper proposes a "forecast-first" uncertainty quantification (UQ) framework. It uses a unique Dirichlet mixture schedule to fuse the empirical distribution with a frozen pretrained generative model into a time-consistent (martingale) forecast stream. Calibrated intervals for long-term values of operational metrics are then provided via Martingale Posterior (MP) resampling—all without retraining or density evaluation. On GPT-2 / CIFAR-10 / SVHN, it achieves approximately 90% coverage with only 20 samples (while bootstrap achieves only 37%).

GoR: A Unified and Extensible Generative Framework for Ordinal Regression

This paper reformulates ordinal regression (predicting targets with intrinsic order, such as age, aesthetic scores, watching duration) from "discretizing continuous space into fixed bins for classification" into "autoregressively generating a sequence of ordered tokens, accumulating their values for prediction, and determining termination via a dynamic ⟨EOS⟩." Derived from a bias-variance decomposition, the authors propose an error bound and the CoDi vocabulary construction criterion, consistently outperforming SOTA across 15 benchmarks in 5 domains.

Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion

This paper extends manifold theory from images to tabular diffusion models, proving that the gradient of any differentiable inference-time loss lies within the tangent space of the data manifold (not limited to squared error). Based on this, it proposes Harpoon, a method that guides unconditional samples along the manifold at inference time to satisfy diverse tabular constraints.

Hilbert-Guided Sparse Local Attention

Hilbert space-filling curves are utilized to reorder 2D image tokens into 1D sequences that preserve spatial proximity. This significantly increases the block sparsity ratio of local attention (improving the empty block ratio from 87.5% to 96.9%). Combined with FlexAttention, it achieves a 4x speedup for window attention and an 18x speedup for sliding attention with minimal accuracy loss.

Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers

This paper replaces the expensive tensor-product Hebbian memory in TEM with a "Meta-MLP fast weight" relational memory, resulting in mm-TEM—a structured spatial memory that is training-efficient, exhibits spontaneous grid cell emergence, and generalizes to long sequences. By paralleling mm-TEM with a single-layer Transformer to form Hippoformer, the model complements the Transformer's precise short-range memory with structured long-range memory, achieving stronger long-range generalization in 2D/3D prediction tasks.

HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning

HippoTune upgrades "single-step prompt pool retrieval" to an intra-layer iterative latent space retrieval cycle mimicking the hippocampal EC–DG–CA3–CA1 loop. Through several rounds of "query–retrieval–feedback," it deeply activates memories of previous tasks, improving the accuracy of buffer-free PEFT-CL by 5–8% with approximately half the FLOPs.

Homeostatic Adaptation of Optimal Population Codes under Metabolic Stress

This paper supplements the classic "Optimal Population Coding" theory with two overlooked biological constraints—firing rate homeostasis and a direct ATP-linked energy budget. It is the first to mathematically predict the "flattening of tuning curves" observed in the mouse visual cortex under metabolic stress while unifying previously contradictory models as special cases of its own framework.

How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation

This paper empirically debunked three fundamental flaws in the standard benchmarks used in the "Source Independent Transferability Estimation (SITE)" field—unrealistic model spaces, leaderboards exploitable by static rankings, and score scales unrelated to real accuracy differences. It demonstrated that a static heuristic ranking, which ignores data entirely, outperforms all sophisticated SITE metrics. Consequently, the authors provided best practices for constructing more realistic benchmarks and introduced a new benchmark suite.

IC-Custom: Diverse Image Customization via In-Context Learning

TBD after in-depth reading

Identity-Free Deferral For Unseen Experts

This paper points out that existing "Learning to Defer" (L2D) methods fail when facing unseen experts with out-of-distribution (OOD) capability profiles because they learn "identity shortcuts" by processing class-indexed signals in fixed coordinates. The authors propose Identity-Free Deferral (IFD), which structurally enforces permutation invariance using a "role-indexed" low-dimensional state and pairs it with an uncertainty-aware training objective that requires no query-time expert labels. IFD is significantly more stable for unseen and especially OOD experts on medical imaging and ImageNet-16H human annotations.

Improving Set Function Approximation with Quasi-Arithmetic Neural Networks

The authors propose QUANN (Quasi-Arithmetic Neural Networks), which utilizes invertible neural networks to implement learnable Kolmogorov means as pooling operations. This represents the first machine learning realization of generalized central tendency measures. QUANN serves as a universal approximator for mean-decomposable set functions, and its learned embeddings exhibit significantly stronger transferability across tasks.

Internal Evaluation of Density-Based Clusterings with Noise

This paper proposes DISCO, an internal evaluation metric for density-based clustering results with noise. It uses density-connectivity instead of Euclidean compactness to evaluate clusters of arbitrary shapes and explicitly evaluates whether noise labels represent "true noise" or "points that should be in a cluster but were discarded."

It's All Just Vectorization: einx, a Universal Notation for Tensor Operations

This paper elevates "vectorization" to a unified meta-concept, pointing out that almost all Numpy-style tensor operations can be decomposed into "a few base operations + their respective vectorizations." Based on this, it designs a declarative, bracketed universal tensor notation called einx, analogous to loop notation, which compresses a vast and inconsistent array of tensor APIs into a small set of base operations.

Latent Fourier Transform

The LatentFT framework is proposed to apply Discrete Fourier Transform (DFT) on the latent time-series representation of a diffusion autoencoder to separate musical patterns by timescale. During training, a stochastic correlated log-frequency mask is used to enable the decoder to reconstruct from partial spectra. During inference, users selectively retain or mix musical elements at different timescales by specifying frequency masks. LatentFT significantly outperforms baselines like ILVR, Guidance, Codec Filtering, and RAVE in conditional generation and music fusion tasks, with a listening test of 29 musicians confirming its superior audio quality and fusion capabilities.

LPWM: Latent Particle World Models for Object-Centric Stochastic Dynamics

LPWM is the first self-supervised object-centric world model capable of scaling to real-world multi-object datasets. The core innovation is learning independent latent action distributions for each particle (per-particle latent actions). By utilizing a causal spatio-temporal Transformer to encode all frames in parallel, it supports diverse conditional generation (actions, language, image goals, multi-view). It achieves SOTA in video prediction and demonstrates imitation learning capabilities (89% success rate on OGBench task3).

Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus

Targeting Quantum Federated Learning (QFL) scenarios where different clients can only support different circuit depths, Quorus employs layerwise loss and reverse distillation to enable collaborative training across quantum models of varying depths. It proposes four quantum classifier designs (Layerwise/Ancilla/Blocking/Funnel) with distinct trade-offs in shots, qubits, mid-circuit measurement, and Hilbert space, achieving an average test accuracy improvement of 12.4% over the SOTA.

Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation

The ADAlign framework is proposed to adaptively align source and target graph distributions in the spectral domain using neural characteristic functions. This eliminates the need for manual selection of alignment criteria by automatically identifying the most significant distribution discrepancies in each transfer scenario. It achieves SOTA on 16 transfer tasks across 10 datasets while reducing memory and training time.

Learning Distributions over Permutations and Rankings with Factorized Representations

By replacing permutations with "factorized representations" (Lehmer code / Fisher-Yates / Insertion vectors) that correspond one-to-one with the symmetric group, the model can use standard masked language modeling or autoregressive cross-entropy to learn arbitrary permutation distributions. These models always generate valid permutations during sampling and can trade off expressivity for inference speed (NFE) without retraining.

Learning in Prophet Inequalities with Noisy Observations

In a prophet inequality setting with unknown distributions and "linear contextual rewards" observed with noise, the authors achieve optimal competitive ratios of \(1-1/e\) (i.i.d.) and \(1/2\) (non-i.i.d.) without offline samples by employing a "learn-as-you-stop" LCB threshold policy.

Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks

This paper utilizes algebraic geometry tools to systematically analyze MLPs and CNNs with polynomial activations: it proves the finite identifiability of MLPs and the unique identifiability of CNNs, reveals that sparse subnetworks correspond to singularities on the neuromanifold, and provides a geometric explanation for the sparse bias in MLPs through the lens of "critical exposure"—a property that CNNs do not possess.

Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation

This paper proposes DiffGDA—the first method to introduce diffusion models into Graph Domain Adaptation (GDA). It models the joint continuous-time structure-semantic evolution from source graphs to target graphs using Stochastic Differential Equations (SDEs), driven by a density-ratio-based domain-aware guidance network toward the target domain. Theoretically proven to converge to the optimal adaptation path, it outperforms SOTA methods across 14 transfer tasks on 8 real-world datasets.

Learning Survival Distributions with Individually Calibrated Asymmetric Laplace Distribution

This paper proposes ICALD, which reinterprets the pinball loss of quantile regression as the negative log-likelihood (NLL) of the Asymmetric Laplace Distribution (ALD). This allows a parametric framework to simultaneously capture the smoothness of parametric methods and the flexibility of non-parametric methods. It theoretically proves that the resulting survival model is "Probably Approximately Individually Calibrated" (PAIC) and outperforms 12 baselines across accuracy, concordance, and especially fine-grained calibration.

MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention

MARS attaches a slot memory router to a frozen large-scale backbone, employing statistical hypothesis testing (SGSE) to decide "when to expand" and a two-stage contrastive-distillation process (DCDA) to determine "how to merge." It balances plasticity and stability without replaying original data, offering formalized guarantees for both expansion and forgetting.

Measuring Uncertainty Calibration

Addressing the finite-sample estimation problem of \(L_1\) calibration error for binary classifiers, this work proposes the first non-asymptotic, distribution-free certifiable upper bounds under two structural assumptions: bounded variation and bounded derivatives. The latter is constructively guaranteed by applying minor perturbations to classifier outputs. Experiments demonstrate that the calibration error upper bound can be controlled at approximately 0.02 with \(10^7\) samples.

Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets

A hierarchical DRO framework is proposed to capture both group proportion shifts and intra-group distributional shifts. By defining intra-group ambiguity sets using the \(W_\infty\) distance in semantic space, the method achieves SOTA performance on standard benchmarks. Furthermore, it maintains strong robustness in newly designed minority distribution shift settings where existing methods fail.

Mixed-Curvature Tree-Sliced Wasserstein Distance

The authors extend the Tree-Sliced Wasserstein framework to Mixed-Curvature Spaces (MCS), which are formed by the Cartesian product of Euclidean, spherical, and hyperbolic components. By utilizing "geodesic trees growing across subspaces" as the projection domain, they derive MCTSW—a distribution distance that preserves geometric and topological structures while providing a closed-form solution and remaining parallelizable.

Neural Dynamics Self-Attention for Spiking Transformers

This paper analyzes the bottlenecks of Spiking Self-Attention (SSA) from two perspectives: "lack of local modeling capability" and "high storage overhead of attention matrices." It proposes LRF-Dyn: first, it re-incorporates local bias into SSA using a Local Receptive Field (LRF) to improve accuracy; then, it rewrites the attention calculation into a recursive form that only requires storing membrane potentials by leveraging "charge-fire-reset" neuronal dynamics. This significantly reduces inference memory while pushing the accuracy of Spiking Transformers close to that of ANNs.

Neural Force Field: Few-shot Learning of Generalized Physical Reasoning

Proposes Neural Force Field (NFF), which models object interactions as continuous force fields. By learning force functions via neural operators and decoding trajectories with an ODE integrator, it achieves few-shot SOTA on I-PHYRE (100 trajectories), N-body (200 trajectories), and PHYRE (0.012M data, 267x less than previous SOTA). It reduces cross-scenario RMSE by 32-64% and achieves near-human performance in planning tasks.

Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization

First formalizes the Noise-Aware Generalization (NAG) problem—simultaneously pursuing in-domain robustness and out-of-domain generalization under label noise—and proposes the DL4ND method to detect noisy labels through cross-domain comparison, achieving an improvement of up to 12.5% across 7 datasets.

Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning

The authors propose NcPU, a non-contrastive PU learning framework. By applying a square root transformation to the standard non-contrastive loss (NoiSNCL), gradients are dominated by clean pairs. Combined with PhantomGate for conservative negative supervision and regret-based recovery, the two modules interact iteratively within an EM framework. Without relying on auxiliary negative samples or estimated class priors, the gap between PU and supervised learning on CIFAR-100 is reduced from 14.26% to <1.4%, achieving SOTA on the xBD disaster damage assessment dataset as well.

Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds

This paper investigates the non-clashing teaching of closed neighborhood concept classes in graphs. It provides matching algorithmic upper and lower bounds (\(2^{\mathcal{O}(|E|)}\) tight bound for N-NCTD⁺), FPT algorithms parameterized by treedepth and vertex cover (including the first FPT result for negative labels), and combinatorial upper bounds for planar and unit square graphs, comprehensively advancing the computational and combinatorial understanding of non-clashing teaching.

On the Impact of the Utility in Semivalue-based Data Valuation

This paper introduces a geometric representation termed "spatial signature" to unify the problem of utility selection in data valuation as a directional rotation on the unit circle. It proposes a quantitative robustness metric \(R_p\), revealing that the Banzhaf value exhibits the highest ranking stability across different utilities.

On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets

This paper systematically investigates the Lipschitz continuity of three common set aggregation functions (sum, mean, max) and attention mechanisms under three multiset distance functions, derives Lipschitz upper bounds for set neural networks, and analyzes perturbation stability and generalization under distribution shifts.

Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks

This paper proposes OPZO (Online Pseudo-Zeroth-Order training), which completes spatial credit assignment in spiking neural networks using only a single noisy forward propagation plus top-down direct feedback. It avoids the weight symmetry and multi-phase execution problems of spatial backpropagation, while suppressing the massive variance of zeroth-order methods through a "pseudo-zeroth-order" formulation and momentum feedback connections. It eventually approaches the accuracy of spatial BP on neuromorphic and static datasets with lower estimated on-chip training overhead.

OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation

OSIRIS is a scalable dataset generation pipeline for analog integrated circuit back-end layouts. By systematically enumerating transistor finger arrangements and component position perturbations, it automatically produces a large-scale dataset of DRC-clean, LVS-verified layouts with parasitic-aware performance annotations. It releases a dataset of 87,100 layout variants alongside a reinforcement learning-based layout optimization baseline.

Out of the Shadows: Exploring a Latent Space for Neural Network Verification

By treating a zonotope as a "projection (shadow)" of a high-dimensional hypercube, it is discovered that the input set and the output enclosure share the same latent space. Based on this, a specification-driven input refinement method is proposed to back-propagate unsafe constraints from the output to thin the input space, reducing the number of branch-and-bound subproblems by 60-65%. All operations are matrix-based to achieve efficient GPU acceleration, reaching performance comparable to top-tier tools like \(\alpha\)-\(\beta\)-CROWN across eight VNN-COMP'24 benchmarks.

Oversmoothing, Oversquashing, Heterophily, Long-Range, and More: Demystifying Common Beliefs in Graph Machine Learning

This paper systematically examines nine common myths in the field of graph machine learning regarding oversmoothing, oversquashing, homophily/heterophily, and long-range dependencies. By refuting each myth with concise counter-examples, the authors decouple "oversquashing" into two independent concepts—computational bottleneck and topological bottleneck—resolving widespread conceptual confusion in the literature.

P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context

P3D utilizes a hybrid CNN-Transformer backbone, crop-based pre-training, and an optional global context network to scale neural surrogate models for 3D PDE and turbulence simulations to the \(512^3\) level, achieving superior accuracy, speed, and memory efficiency across both deterministic prediction and probabilistic generation tasks.

Permutation-Consistent Variational Encoding for Incomplete Multi-View Multi-Label Classification

Addressing the "dual missingness" of views and labels in multi-view multi-label classification (iM3C), this paper proposes the PCVE framework. Under an Information Bottleneck objective, it utilizes cross-view variational encoders to learn shared semantic distributions for each view, further aligned via a "permutation consistency" regularization. The method consistently outperforms 9 strong baselines even under 50% view and 50% label missingness.

PlanetAlign: A Comprehensive Python Library for Benchmarking Network Alignment

PlanetAlign is proposed as a PyTorch-based network alignment benchmark library that integrates 18 datasets across 6 domains, 14 methods covering three major categories (consistency, embedding, and Optimal Transport), and standardized evaluation workflows. Through large-scale systematic experiments, it reveals the comprehensive lead of OT-based methods (PARROT/JOENA) in effectiveness and the differentiated performance of various methods in scalability and robustness.

Predicting Kernel Regression Learning Curves from Only Raw Data Statistics

The authors propose the Hermite Eigenstructure Ansatz (HEA), which enables analytical prediction of learning curves (test error vs. sample size) for rotation-invariant kernels on real image datasets (CIFAR-5m, SVHN, ImageNet) using only two statistics: the data covariance matrix and the Hermite decomposition of the target function. The paper proves this ansatz holds for Gaussian data and demonstrates that MLPs in the feature-learning regime learn Hermite polynomials in the order predicted by HEA.

Prior-Free Tabular Test-Time Adaptation

PFT3A addresses test-time adaptation for tabular data under the stringent setting of no access to source data and no knowledge of any source domain priors. By employing three modules (Class Prior Estimation, Robust Feature Learning, and Representative Subspace Exploration), it simultaneously mitigates label shift and feature shift, consistently outperforming existing SOTA methods across five TableShift datasets and three backbones.

PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference

PriorGuide enables a pre-trained diffusion-based amortized simulation-based inference model to adopt a new prior distribution at test-time without retraining. By transforming the prior adaptation into a guidance term added to the diffusion score and employing Gaussian mixture approximations for a closed-form solution, it allows for flexible injection of expert knowledge or prior sensitivity analysis.

Probabilistic Kernel Function for Fast Angle Testing

Ours investigates the angle testing problem in high-dimensional Euclidean space and proposes two deterministic probabilistic kernel functions, \(K_S^1\) and \(K_S^2\), based on reference angles. These are used for angle comparison and angle thresholding, respectively. Theoretical guarantees are obtained without requiring the asymptotic assumption of Gaussian distributions. The methods are applied to Approximate Nearest Neighbor Search (ANNS), achieving 2.5×–3× QPS acceleration on HNSW graphs.

PU-Bench: A Unified Benchmark for Rigorously Reproducible PU Learning

PU-Bench is the first unified open-source PU (Positive-Unlabeled) learning benchmark. Utilizing a configurable data generator, a unified training pipeline, and a standardized evaluation suite, it re-evaluates 18 representative methods across 8 datasets with 2,880 controlled experiments. It reveals conclusions previously obscured by inconsistent experimental settings, such as "no universal winner," the continued competitiveness of the simple nnPU baseline, and a clear trade-off between performance and efficiency.

QUEST: A Robust Attention Formulation Using Query-Modulated Spherical Attention

QUEST normalizes the key vectors in standard scaled dot-product attention to a hypersphere while maintaining the norm degrees of freedom for the queries (i.e., \(A=\mathrm{softmax}(Q\bar{K}^\top)\)). With a modification of less than one line, it simultaneously eliminates training instability caused by attention logit explosion and enables the model to learn more dispersed and robust attention. It consistently outperforms standard attention and QKNorm across multiple tasks including ImageNet classification, segmentation, and adversarial attacks.

RADAR: Learning to Route with Asymmetry-aware Distance Representations

RADAR equips existing neural VRP solvers with a pair of "asymmetry-aware" components—using truncated SVD to decompose asymmetric distance matrices into "departure/arrival" dual-role node embeddings for initialization, and replacing the softmax in the encoder's attention with Sinkhorn normalization (balancing rows and columns). This allows solvers originally limited to symmetric Euclidean distances to generalize stably on real-world asymmetric road networks (e.g., one-way streets, directional congestion), consistently outperforming strong baselines like MatNet, ICAM, and RRNCO across 17 synthetic and 3 real-world VRP variants.

Random Anchors with Low-rank Decorrelated Learning: A Minimalist Pipeline for Class-Incremental Medical Image Classification

For medical image class-incremental learning, this paper proposes RA-LDL: using "frozen random anchors + first-session low-rank residuals" to calibrate pre-trained features for better separability, combined with a set of "decorrelated" analytic classifiers constructed via closed-form ridge regression. The entire pipeline requires gradient training only in the first session, with subsequent tasks updated via recursively accumulated statistics. Despite its minimalist structure, it outperforms various complex SOTA methods across four medical datasets.

Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields

This paper proposes the Decoupled Representation Refinement (DRR) paradigm, which refines embedding structures and caches the results using a deep refiner network during an offline stage. This allows the inference stage to require only fast interpolation and a lightweight decoder, achieving SOTA reconstruction accuracy on ensemble simulation surrogate modeling tasks with less than 1/27 of the inference cost.

Regulating Internal Alignment Flows for Robust Learning Under Spurious Correlations

This paper proposes Alignment-Gated Suppression (AGS): it computes a "class-conditional, confidence-weighted" alignment energy for each neuron during training. By applying multiplicative decay to connections at the quantile tail—those contributing most strongly to the ground-truth class (likely shortcuts)—it simultaneously improves average and worst-group accuracy without group labels and with < 5% additional overhead.

Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

This paper proposes a new intuitive explanation for the underlying mechanism of SAM—interpreting the gradient at the perturbed point as an approximation of the direction toward the local maximum—and reveals its imprecision and the multi-step degradation issue. Consequently, XSAM is proposed to achieve more faithful and effective sharpness-aware minimization by explicitly searching for the maximum direction.

Robust Equation Structure Learning with Adaptive Refinement (RESTART)

RESTART fully integrates the scientific discovery closed loop of "Hypothesis—Experiment—Analysis" into symbolic regression. It employs a Transformer to provide a strong initial equation, explicitly models "unexplained" components of the current equation as boosting-style "exploration functions" for short-term targeted feedback, and distills successful refinements into a reusable code-based structure library for long-term knowledge. This approach outperforms existing SOTA on LLM-SRBench with lower error and higher recovery rates, approaching ground-truth forms on OOD data.

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

This paper proposes Proteína-Complexa (Complexa), unifying the long-separated "generative modeling" and "hallucination sequence optimization" paradigms in protein binder design into a single framework. First, a large-scale synthetic dataset, Teddymer, derived from domain interactions in AFDB, is used to pretrain an all-atom flow-matching generative base model. During inference, test-time scaling algorithms (Best-of-N, beam search, FKS, MCTS) are adapted with structure predictor confidence as rewards to "search" for strong binders, significantly outperforming hallucination methods like BindCraft under normalized compute budgets.

Scaling Direct Feedback Learning with Jacobian Alignment Guarantees

Addressing the collapse of Direct Feedback Alignment (DFA) in deep convolutional networks and Transformers, this paper proposes GrAPE. By using forward-mode JVP to estimate rank-1 Jacobians and applying a local cosine alignment loss to "correct" random feedback matrices toward the true gradient direction—supplemented by periodic sparse single-batch Backpropagation (BP) calibration—the authors successfully scale DFA-like methods to VGG-16, ResNet, and Transformers for the first time, closing a significant portion of the performance gap with BP while maintaining layer-wise parallel updates.

SmellNet: A Large-scale Dataset for Real-world Smell Recognition

SmellNet establishes a machine olfaction benchmark using low-cost portable gas sensors to collect real-world temporal signals from 50 natural ingredients and 43 categories of odor mixtures, proposes SCENTFORMER which integrates temporal differencing, sliding windows, and GC-MS chemical priors.

Soft Quality-Diversity Optimization

The authors propose Soft QD Score as a new optimization objective for quality-diversity that eliminates the need for behavior space discretization. Based on this, they derive a differentiable algorithm, SQUAD, which exhibits superior scalability in high-dimensional behavior spaces and maintains competitive performance with SOTA on standard benchmarks.

SONIC: Spectral Oriented Neural Invariant Convolutions

SONIC transfers the core concepts of State Space Models (SSMs) to the multi-dimensional frequency domain. By defining a set of direction-selective spectral transfer functions with 6 continuous parameters (amplitude, direction, damping, oscillation, etc.) and mixing across channels via low-rank matrices \(B\) and \(C\), it achieves a convolutional replacement operator with inherent global receptive fields and resolution invariance. It matches nnU-Net on 3D medical segmentation with nearly two orders of magnitude fewer parameters and remains competitive on ImageNet.

Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness

SCER provides the first theoretical decomposition of "worst-group error = classifier dependence on spurious directions − dependence on core directions." Based on this, a regularization term is added directly in the embedding space to suppress the alignment of classifier weights with "spurious directions" and enhance alignment with "core directions," achieving SOTA worst-group accuracy across six benchmarks: Waterbirds, CelebA, MetaShift, ColorMNIST, CivilComments, and MultiNLI.

Stable and Scalable Deep Predictive Coding Networks with Meta-Prediction Errors

This paper diagnoses two root causes of instability in training deep Predictive Coding Networks (PCNs) using Dynamical Mean-Field Theory (DMFT)—prediction error imbalance and prediction error explosion/vanishing (EVPE). It proposes Meta-PCN: linearizing nonlinear inference via a "prediction error of the error" (meta-PE) loss and suppressing weight spectral norms near 1 via variance normalization. Meta-PCN outperforms backpropagation in 29 out of 30 configurations on CIFAR-10/100 and TinyImageNet using purely local rules.

t-SNE Exaggerates Clusters, Provably

This work provides a rigorous theoretical proof that t-SNE suffers from two fundamental failure modes: (1) inability to infer the strength of input clustering from the output, and (2) inability to faithfully represent extreme outliers. Even when the input lacks cluster structure or contains extreme outliers, t-SNE can produce visualizations with perfect clustering.

TabStruct: Measuring Structural Fidelity of Tabular Data

This paper proposes the TabStruct evaluation framework and the global utility metric to measure the structural fidelity of tabular data generators without requiring ground-truth causal graphs. By systematically comparing 13 generators across 29 datasets, the study reveals that diffusion models significantly outperform other methods in maintaining global structural integrity.

The Counting Power of Transformers

It is proved that Transformers can capture not only (semi-)linear counting properties but also all semi-algebraic counting properties (i.e., Boolean combinations of multivariate polynomial inequalities). This generalizes previous results regarding the counting capabilities of Transformers and derives new undecidability conclusions.

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

By decomposing AI model errors into bias (systematic misalignment) and variance (incoherent behavior), this study finds that: longer reasoning leads to higher incoherence; larger models become more incoherent on difficult tasks. This suggests that future super-intelligent AI is more likely to manifest "industrial accident" style unpredictable failures rather than consistently pursuing incorrect goals.

Towards Sustainable Investment Policies Informed by Opponent Shaping

This paper formally proves the conditions under which the InvestESG simulation environment constitutes a social dilemma and applies the Advantage Alignment opponent shaping algorithm to guide economic agents toward a sustainable investment equilibrium.

Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition

The paper proposes DeepEISNN, a normalization-free learning framework based on cortical excitatory-inhibitory (E-I) circuits. By implementing E-I Init and E-I Prop, it achieves stable end-to-end training of deep SNNs, balancing performance and biological plausibility.

Using maximal information auxiliary variables to improve synthetic data generation based on TabPFN foundation models

This paper identifies that direct use of TabPFN for tabular synthetic data generation fails on weakly correlated variables. It proposes Maximal Information Auxiliary Variables (MIAV): by rank-matching random noise to real variables as auxiliary inputs, TabPFN only needs to learn the univariate relationship between \(X_j\) and \(M_j\), enabling stable and efficient generation of synthetic data that preserves marginal distributions and association structures.

What happens when generative AI models train recursively on each others' outputs?

Ours formalizes the question of whether "multiple generative AI models will consume each other's generated content in the future" as a data-mediated interaction training problem. Theory and LLM experiments demonstrate that mixing an appropriate amount of real data with synthetic data from other models can bring cross-task transfer, but excessive reliance on synthetic data damages original tasks and leads to the gradual homogenization of model outputs.

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

CALIPER proposes a detector- and model-agnostic, data-only test that tracks the monotonicity of proxy errors from weighted local regression with respect to the locality parameter \(\theta\). This method estimates the minimum data volume required for safe retraining after abrupt concept drift without the need to actually retrain downstream models.