📐 Learning Theory¶

🧪 ICML2025 · 16 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (293) · 🧪 ICML2026 (45) · 🤖 AAAI2026 (3) · 🧠 NeurIPS2025 (25)

Avoiding Catastrophe in Online Learning by Asking for Help: A novel theoretical framework for online learning is proposed to address catastrophic (irreversible) errors: payoffs are defined as catastrophe avoidance probabilities, and the objective function is the product of payoffs (overall catastrophe avoidance probability). By introducing an instructor querying mechanism and a Local Generalization assumption, the paper establishes an impossibility result (catastrophe is inevitable without query) and a possibility result (if the policy class is learnable, both regret and query rate vanish simultaneously), elevating sublinear regret in standard online learning to subconstant regret.
Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update: This paper proposes Hvt-UCB, a one-pass Huber regression algorithm based on Online Mirror Descent for linear bandits with heavy-tailed noise. It reduces the per-round computational complexity from \(\mathcal{O}(t\log T)\) to \(\mathcal{O}(1)\) while maintaining an optimal and instance-dependent regret bound.
Improved and Oracle-Efficient Online \(\ell_1\)-Multicalibration: Proposes to reduce online \(\ell_1\)-multicalibration to a newly defined Online Linear-Product Optimization (OLPO) problem, achieving improved and oracle-efficient online multicalibration error bounds of \(\widetilde{O}(T^{-1/3})\) and \(\widetilde{O}(T^{-1/4})\), respectively.
Improved Generalization Bounds for Transductive Learning by Transductive Local Complexity and Its Applications: This paper proposes the Transductive Local Complexity (TLC) framework, extending classical local Rademacher complexity to the transductive learning setting. It achieves excess risk bounds that are almost consistent with inductive learning (differing only by logarithmic factors) and resolves a decade-old open problem.
Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors: In Metrical Task Systems (MTS), when the algorithm can only access \(\ell\) heuristics in a bandit fashion (querying only one heuristic per step and requiring \(m\) consecutive queries to observe the state), this paper presents an algorithm with \(O(\text{OPT}^{2/3})\) regret and proves that this bound is tight.
Learning-Augmented Hierarchical Clustering: This paper investigates leveraging side information from a splitting oracle to bypass the approximation hardness barriers of hierarchical clustering, achieving an \(O(1)\)-approximation for the Dasgupta objective and a \((1-o(1))\)-approximation for the Moseley-Wang objective, with extensions to streaming and parallel computing settings.
Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures: This work presents the first single-pass streaming algorithm for the maximum coverage problem under the turnstile streaming model (supporting arbitrary insertions/deletions) with \(\tilde{O}(d/\varepsilon^3)\) space and \(\tilde{O}(1)\) update time. The algorithm is further extended to the privacy fingerprinting scenario, achieving up to a 210× speedup compared to prior methods in experiments.
Multiple-Policy Evaluation via Density Estimation: This paper proposes the CAESAR algorithm, which simultaneously evaluates \(K\) policies through a two-phase approach (coarse estimation of the visitation distribution + density ratio estimation under the optimal sampling distribution). This achieves a non-asymptotic, instance-dependent sample complexity. The core technique is "coarse estimation"—obtaining a constant-factor distribution approximation using only \(O(1/\epsilon)\) samples.
Near-Optimal Consistency-Robustness Trade-Offs for Learning-Augmented Online Knapsack Problems: A family of online knapsack algorithms based on simple predictions (point or interval predictions of the critical value) is proposed. These algorithms achieve near-Pareto optimal trade-offs between consistency and robustness, alongside a general reduction from fractional to integer solutions.
Near Optimal Best Arm Identification for Clustered Bandits: In the multi-agent clustered multi-armed bandit setting, this paper proposes two algorithms, Cl-BAI and BAI-Cl, which exploit the clustering structure to significantly reduce the sample complexity of best arm identification. It is further proved that BAI-Cl++ achieves minimax optimality when \(M\) is a constant.
On Fine-Grained Distinct Element Estimation: Proposes using the pairwise collisions \(C\) as a fine-grained complexity parameter for the distributed distinct element counting problem, designs a protocol whose communication decreases significantly as \(C\) decreases, breaking the prior worst-case lower bound of \(\Omega(\alpha/\varepsilon^2)\), and provides matching lower bounds across all parameter regimes.
Positional Attention: Expressivity and Learnability of Algorithmic Computation: Proposes Positional Transformer—a Transformer variant where attention weights are determined solely by positional encodings and are independent of input data. The authors prove that it maintains equivalent expressivity to the MPC parallel computation model (costing only \(O(\log n)\) additional depth), and demonstrates significantly better out-of-distribution (OOD) generalization capability on algorithmic tasks.
Principled Algorithms for Optimizing Generalized Metrics in Binary Classification: This paper proposes METRO, a principled algorithm for optimizing generalized classification metrics (such as \(F_\beta\), Jaccard, weighted accuracy, etc.). Based on \(H\)-consistency bounds and surrogate loss theory, it reformulates metric optimization as a generalized cost-sensitive learning problem, providing finite-sample generalization guarantees.
Provably Efficient Algorithm for Best Scoring Rule Identification in Online Principal-Agent Information Acquisition: This paper studies the Best Scoring Rule Identification (BSRI) problem under the principal-agent online information acquisition framework, proposing two algorithms, OIAFC (fixed confidence) and OIAFB (fixed budget). It establishes the first instance-dependent sample complexity upper bound of \(\widetilde{O}(MH_\Delta)\) and significantly improves the instance-independent sample complexity from the existing limit of \(\widetilde{O}(C_O^3 K^6 \epsilon^{-3})\) to \(\widetilde{O}(MK\epsilon^{-2})\).
Sparse-Pivot: Dynamic Correlation Clustering for Node Insertions: This paper proposes the Sparse-Pivot algorithm, which achieves a \((20+\varepsilon)\)-approximation for the correlation clustering problem under dynamic node insertions with amortized \(O_\varepsilon(\log^{O(1)} n)\) database operations. It significantly improves the approximation factor of Cohen-Addad et al. (ICML 2024) and comprehensively outperforms benchmarks in experiments.
Theoretical Performance Guarantees for Partial Domain Adaptation via Partial Optimal Transport: This paper derives generalization bounds for Partial Domain Adaptation (PDA) based on the theory of partial optimal transport, proves the rationality of using partial Wasserstein distance as the domain alignment term alongside the proposed theoretically-driven weighting scheme, and subsequently develops a practical algorithm named WARMPOT.