📐 Learning Theory¶

🧠 NeurIPS2025 · 25 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (293) · 🧪 ICML2026 (45) · 🤖 AAAI2026 (3) · 🧪 ICML2025 (16)

🔥 Top topics: Domain Adaptation ×3

A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning: This paper proposes a theoretical framework based on K-L divergence and high-dimensional statistical analysis to determine the optimal number of samples to transfer from each source task in multi-source transfer learning. The framework avoids the negative transfer caused by naively using all source data, and the resulting algorithm OTQMS surpasses the state of the art by 1.0–1.5% on DomainNet and Office-Home while reducing sample usage by 47.85% and training time by 35.19%.
Adaptive Data Analysis for Growing Data: This paper establishes the first generalization bounds for adaptive analysis over dynamically growing data, permitting analysts to schedule queries adaptively based on current dataset size, and achieving increasingly tight guarantees as data accumulates via time-varying empirical accuracy bounds and differential privacy mechanisms.
Computable Universal Online Learning: This paper introduces computability constraints into the universal online learning framework, proving that "mathematically learnable" does not imply "learnable by a computer program," and provides precise characterizations of computable learning under both agnostic and proper variants.
Conformal Online Learning of Deep Koopman Linear Embeddings: This paper proposes the COLoKe framework, which reinterprets conformal prediction as a model consistency diagnostic tool. Parameter updates are triggered only when the Koopman model's prediction error exceeds a dynamically calibrated threshold, enabling efficient online Koopman linear embedding learning for nonlinear dynamical systems.
Diffusion Transformers for Imputation: Statistical Efficiency and Uncertainty Quantification: This paper analyzes the sample complexity and uncertainty quantification performance of conditional diffusion Transformers (DiT) for time series imputation from a statistical learning perspective, and proposes a mixed-masking training strategy to improve imputation quality.
Efficient Kernelized Learning in Polyhedral Games Beyond Full-Information: From Colonel Blotto to Congestion Games: This paper proposes a kernelization-based framework for designing computationally efficient no-regret learning algorithms for polyhedral games (Colonel Blotto, graphic matroid congestion games, and network congestion games) under partial-information feedback, significantly improving the runtime complexity for learning coarse correlated equilibria (CCE).
Finite-Time Analysis of Stochastic Nonconvex Nonsmooth Optimization on the Riemannian Manifolds: This paper proposes the Riemannian Online to NonConvex (RO2NC) algorithm and its zeroth-order variant ZO-RO2NC, establishing for the first time a finite-time sample complexity guarantee of \(O(\delta^{-1}\epsilon^{-3})\) for fully nonsmooth nonconvex stochastic optimization on Riemannian manifolds, matching the optimal result in Euclidean space.
How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension: This paper introduces the Domain Shattering Dimension (Gdim), a novel combinatorial measure that tightly characterizes the number of domains required for domain generalization (i.e., the domain sample complexity), and establishes its relationship to the classical VC dimension as \(\Theta(d \log(1/\alpha))\).
Improved Approximation Algorithms for Chromatic and Pseudometric-Weighted Correlation Clustering: For two important generalizations of Correlation Clustering—Chromatic CC and pseudometric-weighted CC—this paper achieves a 2.15-approximation and a tight 10/3-approximation, respectively, via LP relaxation and carefully designed rounding functions, significantly improving upon the previous best results of 2.5 and 6.
Infrequent Exploration in Linear Bandits: This paper proposes the INFEX framework, which executes a baseline algorithm (e.g., LinUCB/LinTS) at designated exploration steps according to a given schedule and selects arms greedily at all other time steps. It is proven that as long as the number of exploration steps exceeds \(\omega(\log T)\), INFEX achieves the same poly-logarithmic regret as full-time exploration while substantially reducing computational overhead (80%–99% of time steps are greedy).
Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning: This paper proposes Controllable Pseudo-label Generation (CPG), a framework that progressively incorporates reliable pseudo-labels into the labeled set via a controllable self-reinforcing optimization cycle. By training a Bayes-optimal classifier on a distribution of known composition, CPG achieves accuracy gains of up to 15.97% in the Realistic LTSSL setting where the unlabeled data distribution is entirely unknown.
Kernel Conditional Tests from Learning-Theoretic Bounds: A unified framework is proposed for converting confidence bounds of learning algorithms into conditional hypothesis tests. Built upon kernel ridge regression, the framework yields conditional two-sample tests with finite-sample guarantees and, for the first time, supports non-i.i.d. data and online sampling scenarios.
Learning-Augmented Online Bipartite Fractional Matching: This paper proposes two learning-augmented algorithms (LAB and PAW) for online bipartite fractional matching. Given a potentially inaccurate advice matching, both algorithms Pareto-dominate the naïve CoinFlip strategy across the entire robustness spectrum for the first time.
Learning-Augmented Streaming Algorithms for Correlation Clustering: This paper proposes the first learning-augmented streaming algorithms for Correlation Clustering. By leveraging pairwise distance predictions, the proposed methods achieve a better-than-3 approximation ratio on complete graphs (\(\tilde{O}(n)\) space) and an \(O(\log|E^-|)\) approximation ratio on general graphs (\(\tilde{O}(n)\) space), significantly improving the space–approximation tradeoff over existing prediction-free algorithms.
Non-Clairvoyant Scheduling with Progress Bars: This paper introduces a "progress bar" information model as an interpolation framework between clairvoyant and non-clairvoyant scheduling. It designs scheduling algorithms with optimal consistency–robustness tradeoffs for both adversarial and stochastic progress bars, while advancing the theoretical frontier of learning-augmented scheduling.
On Agnostic PAC Learning in the Small Error Regime: In the small error regime of agnostic PAC learning (\(\tau \approx d/m\)), this paper constructs a computationally efficient learner based on ERM aggregation that achieves an error upper bound of \(c \cdot \tau + O(\sqrt{\tau d/m} + d/m)\) with \(c \leq 2.1\), matching known lower bounds and advancing the precise complexity characterization of agnostic learning.
Optimism Without Regularization: Constant Regret in Zero-Sum Games: This paper provides the first proof that Optimistic Fictitious Play without regularization achieves \(O(1)\) constant regret in \(2\times2\) zero-sum games, matching the optimal rate of regularized Optimistic FTRL. It further establishes an \(\Omega(\sqrt{T})\) regret lower bound for Alternating Fictitious Play, separating the capabilities of optimism and alternation in the unregularized setting.
Prediction-Powered Semi-Supervised Learning with Online Power Tuning: This paper extends the Prediction-Powered Inference (PPI) framework to the training phase of semi-supervised learning. It proposes an unbiased gradient estimator and designs an online AdaGrad algorithm to dynamically tune the interpolation parameter \(\lambda\) between pseudo-labels and true labels, achieving convergence rates matching the optimal fixed \(\lambda\) while maintaining unbiasedness.
Product Distribution Learning with Imperfect Advice: This paper studies the problem of learning product distributions over the Boolean hypercube given an imperfect advice distribution, and proposes an efficient algorithm that achieves sub-linear dependence on dimension \(d\) in sample complexity when the advice is of sufficient quality.
Reliably Detecting Model Failures in Deployment Without Labels: This paper proposes D3M (Disagreement-Driven Deterioration Monitoring), a three-stage model monitoring algorithm based on variational Bayesian posterior sampling, which reliably detects model performance degradation in label-free, training-data-free deployment settings while maintaining low false positive rates under non-degrading distribution shifts.
Revisiting Agnostic Boosting: This paper proposes a new agnostic boosting algorithm that substantially improves the sample complexity of prior work under very general assumptions, and establishes nearly matching lower bounds, thereby resolving the sample complexity of agnostic boosting up to logarithmic factors.
Sample-Adaptivity Tradeoff in On-Demand Sampling: This paper systematically studies the tradeoff between sample complexity and adaptive rounds in on-demand sampling. In the realizable setting, it proves that the optimal sample complexity of \(r\)-round algorithms is \(dk^{\Theta(1/r)}/\varepsilon\). In the agnostic setting, it proposes the LazyHedge algorithm that achieves near-optimal sample complexity in only \(\widetilde{O}(\sqrt{k})\) rounds, and introduces the OODS abstract framework to establish nearly tight round complexity lower bounds.
The Parameterized Complexity of Computing the VC-Dimension: This paper systematically investigates the parameterized complexity of computing the VC dimension, establishing that the naive exhaustive algorithm is asymptotically optimal under ETH, presenting an FPT 1-additive approximation algorithm parameterized by maximum degree, an exact \(2^{O(\text{tw} \cdot \log \text{tw})} \cdot |V|\) algorithm parameterized by treewidth, and a complete characterization of the tractability landscape across all standard structural parameters.
The Structural Complexity of Matrix-Vector Multiplication: This paper proves that for Boolean matrices \(\mathbf{M} \in \{0,1\}^{m \times n}\) with corrupted VC-dimension \(d\), matrix-vector multiplication can be performed in \(\widetilde{O}(nm^{1-1/d}+m)\) time. This is the first truly sub-quadratic upper bound for structured matrices, refuting the applicability of the OMv conjecture on structured inputs, and yields the first high-accuracy sub-quadratic algorithms for dynamic Laplacian solving, effective resistance, triangle detection, and related problems.
Transfer Learning for Benign Overfitting in High-Dimensional Linear Regression: This paper proposes a two-step Transfer MNI (TM) method that enhances generalization of benign overfitting in overparameterized high-dimensional linear regression via a "preserve target signal + transfer source knowledge in the null space" mechanism. Non-asymptotic excess risk bounds are derived under both model shift and covariate shift, and a "free lunch" covariate shift regime is identified.