📂 Others¶
🧪 ICML2026 · 27 paper notes
📌 Same area in other venues: 📷 CVPR2026 (44) · 🔬 ICLR2026 (74) · 🤖 AAAI2026 (124) · 🧠 NeurIPS2025 (141) · 📹 ICCV2025 (48)
🔥 Top topics: Diffusion Models ×2 · Continual Learning ×2
- Active Tabular Augmentation via Policy-Guided Diffusion Inpainting
-
This paper formalizes the "fidelity-utility gap" in tabular augmentation (where generators optimize for distribution matching, but augmentation value comes from low-density regions), and proposes the TAP algorithm, which uses diffusion inpainting for manifold-constrained proposals, policy-guided utility-aligned selection, and conservative windowed submission with hard constraint gating. On 7 real tabular datasets, TAP improves classification accuracy by up to 15.6% and reduces regression RMSE by 32% compared to baselines.
- Adaptive Multi-Round Allocation with Stochastic Arrivals
-
This work formalizes network recruitment as a budget-constrained sequential control problem, proves that the single-round optimal allocation is greedy; reduces multi-round planning to \(O(b^5\log b)\) complexity via a population-level surrogate value function, and provides robustness guarantees under model misspecification by decomposing errors into frontier, population, and approximation types.
- AI Cap-and-Trade: Efficiency Incentives for Accessibility and Sustainability
-
Drawing inspiration from carbon cap-and-trade, the authors propose a quota-trading market for AI inference FLOPs (AI Allowance). Using KKT conditions, they prove that under reasonable parameters, this mechanism strictly reduces FLOP usage across companies, thereby simultaneously mitigating both the energy consumption of large models and the market exclusion of smaller companies.
- Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features
-
TabCascade decomposes each table row into "low-resolution (categorical + discretized numerical)" and "high-resolution (continuous numerical)" cascaded stages: first, CDTD learns the low-res joint distribution; then, flow matching generates numerical details conditioned on the low-res output, with data-dependent coupling and a learnable nonlinear time schedule to tighten transport cost. It natively supports generation of mixed-type features such as missing values and zero-inflation, achieving a 51.9% improvement in detection score over SOTA on 12 datasets.
- Complexity as Advantage: A Regret-Based Perspective on Emergent Structure
-
This paper proposes Complexity-as-Advantage (CAA): redefining "complexity" as the regret dispersion among a family of resource-bounded observers on the same process. It is shown that, under the log-loss + Markov framework, this is equivalent to the sum of conditional mutual information atoms (recovering excess entropy), and from a coding perspective, to the variance of excess description length (MDL). Thus, Kolmogorov complexity, Bennett logical depth, and excess entropy are unified into a computable, empirically estimable scalar spectrum.
- Decision Tree Learning on Product Spaces
-
This work extends the theoretical guarantees of "top-down greedy decision tree heuristics" from Blanc et al. (ITCS'20) from the uniform distribution to arbitrary product distributions, providing an upper bound of \(\exp(\Delta_\mathrm{opt} D_\mathrm{opt}\log(e/\epsilon))\) (strictly tighter than ITCS'20 in the full binary tree case), and is completely parameter-free—it can be run without prior knowledge of the optimal tree size or depth.
- Estimating Correlation Clustering Cost in Node-Arrival Stream
-
This paper studies the problem of approximating the cost of correlation clustering in the "node-arrival" data stream model. The authors propose the C4Approx algorithm, which achieves a \((O(1), n^{1-\alpha})\)-approximation using \(O(n^{(3+\alpha)/4}\log n)\) words of sublinear space and a constant number of passes. Two matching lower bounds are provided, showing that both multi-pass and additive error are unavoidable. On real data, storing only 2% of nodes suffices to match the performance of Pivot.
- From Generalist to Specialist Representation
-
This paper provides the first fully nonparametric (no intervention, no functional constraints) two-layer hierarchical identifiability proof: the temporal-task structure is identifiable via CI tests from the collider perspective, and task-relevant latents can be separated from generalist representations using sparsity regularization.
- From Human-Level AI Tales to AI Leveling Human Scales
-
This paper uses LLMs as population extrapolators, calibrating 18 ability dimensions on a "world population accuracy" logarithmic scale \(L=-\log_B p_W\). It finds that the true base for Volume / Attention dimensions is \(B \gg 10\), while for Comprehension \(B \approx 1\), revealing that current AI-human comparisons are fundamentally misaligned.
- GEM-FI: Gated Evidential Mixtures with Fisher Modulation
-
This paper addresses two key issues in evidential deep learning (EDL): overconfidence on out-of-distribution (OOD) samples and the inability of single-head models to capture multimodal epistemic uncertainty. It proposes a three-part solution—GEM-Core/MIX/FI: using learned feature energy to gate evidence, employing a mixture of evidential heads to approximate ensemble behavior in a single inference pass, and introducing Fisher information regularization to stabilize mixture weights. On OOD detection tasks such as CIFAR-10→SVHN/CIFAR-100, the method outperforms DAEDL while maintaining single-pass inference.
- DynaDiff: Generative Adaptation of Dynamics to Environmental Shifts via Weight-space Diffusion
-
DynaDiff reframes the meta-learning problem of "training a predictor for a new environment" as a conditional sampling task of "directly generating the full network weights using a diffusion model." Leveraging weight graphs, function-consistency loss, and a dynamics-aware prompter, it achieves an average RMSE reduction of 10.78% over strong baselines across four PDE systems.
- HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning
-
Inspired by the physical intuition of Helmholtz free energy, each domain's prompt parameters are trained to form an "energy curve compressed to boundary \(\Theta\) and aligned to midline \(\Delta\)." During inference, an energy factor and a distance factor are jointly used to weight each domain prompt. This approach improves performance on unknown domains by 1.76 / 3.12 / 2.57 percentage points on the CDDB / DomainNet / CORe50 DIL benchmarks, respectively.
- Local and Mixing-Based Algorithms for Gaussian Graphical Model Selection from Glauber Dynamics
-
This work is the first to study the problem of learning Gaussian graphical model structure from a single trajectory of Gaussian Glauber dynamics. Two complementary algorithms are proposed: LET-GL (local edge testing based on i,i,j,i windows, perfectly parallelizable) and BTR-GL (under the Dobrushin condition, uses burn-in/thinning to "decorrelate" the trajectory into approximately i.i.d. samples, which are then fed to existing i.i.d. learners). The paper provides finite-sample recovery guarantees, information-theoretic lower bounds, and an independently valuable total variation mixing bound for the random-scan Gaussian Gibbs sampler.
- Local Hessian Spectral Filtering for Robust Intrinsic Dimension Estimation
-
This paper proposes LHSD, which applies a Hill-type spectral filter to the log-density Hessian of a score model, retaining only near-zero eigenvalues to count the dimension of the tangent space. Stochastic Lanczos Quadrature reduces the computational cost from \(\mathcal{O}(D^3)\) to \(\mathcal{O}(D)\), enabling stable estimation of local intrinsic dimension in 3072-dimensional image spaces, and is used to diagnose memorization of training samples in diffusion models.
- Matroid Algorithms Under Size-Sensitive Independence Oracles
-
The authors propose a size-sensitive matroid oracle model where the query cost grows linearly with the size of the queried set, and prove that under this model, the optimal query cost for finding a basis, estimating rank, and estimating the partition number is all \(\tilde{\Theta}(n^2)\). For matroids with bounded circuit size \(c\), they provide an \(\mathcal{O}(n^{2-1/c}\log n)\) algorithm for maximum weight basis, breaking the quadratic lower bound.
- Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment
-
For TabPFN-type "tabular foundation models" that feed the training set directly as in-context input to attention, this work proposes posterior correction—finding that such models severely overfit the majority class in the training set. The authors introduce DistPFN: a one-line posterior reweighting \(\tilde{p}(y) \propto \hat{p}(y)^2 / p_{train}(y)\), which lifts TabPFN-v2 accuracy under strong label shift (\(\beta=5\)) from 72.7% to 76.9% on 253 OpenML datasets—without retraining, estimating test priors, or modifying the architecture.
- Mixture Prototype Flow Matching for Open-Set Supervised Anomaly Detection
-
MPFM replaces the traditional "unimodal Gaussian prototype" in OSAD with a learnable Gaussian mixture prototype space, directly regresses a GMM-form velocity field via flow matching, and adds a mutual information maximization regularizer to prevent prototype collapse. On 9 industrial/medical AD datasets, under the 10/1 anomaly sample setting, it outperforms all SOTA methods including DRA, AHL, and DPDL.
- Networked Information Aggregation for Binary Classification
-
Extends the Kearns-Roth-Ryu 2026 result—"sequentially passing prediction columns among linear regression agents on a DAG nearly achieves global optimum"—to binary classification: each agent observes only a subset of feature columns and sequentially forwards its logit to downstream agents. Under the \(M\)-coverage condition, this achieves global logistic regression optimum with \(O(M/\sqrt{D})\) excess BCE loss; a matching hard instance proves an \(\Omega(k/D)\) lower bound, characterizing network depth as the fundamental bottleneck for information aggregation.
- New Bounds for Kernel Sums via Fast Spherical Embeddings
-
By accelerating the Bartal-Recht-Schulman 2011 "randomized Nash device" spherical embedding theorem using iterative Fastfood transforms (time \(\widetilde{O}(d + \Lambda^2 + \varepsilon^{-2})\)), and using it as a preprocessing step for Gaussian KDE to compress the diameter to \(\widetilde{O}(1/\sqrt{\varepsilon})\), this work obtains a new Gaussian KDE query time bound \(\widetilde{O}(d + \varepsilon \Delta_\sigma^2 + 1/\varepsilon^3)\), which outperforms RFF / FJLT+RFF / Fastfood in the regime of small \(\varepsilon\) and moderate diameter.
- NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
-
An asinh-linked GLM surrogate compresses the multi-agent MCTS joint-action space \(d^n\) into a low-dimensional nonlinear bandit. Using "first-order difference + second-order mixed difference" as the NonUCT proposal rule, only a small candidate set \(\mathcal{C}(s)\) is maintained at each node. It is proven to achieve \(\widetilde{O}(T^{3/4})\) local regret (independent of \(d^n\)). On MatGame/SMAC/SMACv2, both sample efficiency and final performance surpass strong baselines such as MAZero.
- Polaris: Coupled Orbital Polar Embeddings for Hierarchical Concept Learning
-
Polaris decomposes concept representations into two decoupled signals—"direction (semantics)" and "orbital potential (hierarchy)"—and learns both on the unit hypersphere: tangent space projection plus exponential mapping ensures manifold closure, anisotropic spherical SVGD prevents equatorial concentration, and vMF KL divergence implements the asymmetric "parent should have higher entropy than child" constraint. On taxonomy expansion tasks, Polaris improves top-K recall by up to 19 points and reduces mean rank by 60%.
- Possibilistic Predictive Uncertainty for Deep Learning
-
This paper replaces the Bayesian probability framework with possibility theory and proposes DAPPr—a method that projects the possibilistic posterior in parameter space onto the prediction space via supremum, fits it with a learnable Dirichlet possibility function, and ultimately yields a cognitive uncertainty modeling approach that requires only 10 lines of code, can directly replace cross-entropy, and outperforms the EDL family in OOD detection.
- Provably Data-driven Multiple Hyper-parameter Tuning with Structured Loss Function
-
This work employs "real algebraic geometry + first-order logic quantifier elimination" to provide the first provable generalization bound for multi-dimensional hyperparameter tuning, extending the Balcan 2025 framework—which was limited to one-dimensional scalar hyperparameters—to arbitrary \(p\) dimensions, bilevel validation losses, approximate inner optimization, and other practical scenarios. It also provides the first matching lower bound.
- Realizable Bayes-Consistency for General Metric Losses
-
This paper provides a sharp characterization for the open problem of "when does a hypothesis class \(\mathcal{H}\) admit a distribution-free, strongly universally Bayes-consistent learning algorithm under general (possibly unbounded) metric losses" in the realizable case—the necessary and sufficient condition is that \(\mathcal{H}\) does not contain a new type of "unbounded gap Littlestone tree" combinatorial obstruction.
- Position: Reliable AI Needs to Externalize Implicit Knowledge: A Human-AI Collaboration Perspective
-
This ICML position paper argues that all current AI reliability methods (RAG / Self-Consistency / RLHF / Agent Memory) can only verify explicit knowledge, while the true power of AI comes from the 80-95% of "implicit knowledge" in training data that has never been formally recorded by humans. The author proposes Knowledge Objects (KOs) as infrastructure—externalizing AI's implicit reasoning into structured artifacts that humans can inspect, verify, and endorse, enabling the cost of a single human verification to compound across the community over time.
- Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts
-
The authors propose CaRE: inserting a bi-level routing MoE (BR-MoE) into each ViT block—first, a "class recognizer" selects the Top-M relevant task routers based on entropy, then each router activates its Top-K task experts and adds a shared EMA expert. This enables retention of old knowledge and continual absorption of new classes even with 300+ tasks, filling the gap in "long-sequence CIL" (and releasing the 1000-class OmniBenchmark-1K benchmark).
- Singular Bayesian Neural Networks
-
This work directly parameterizes the weight matrix as \(W=AB^\top\) instead of applying a mean-field distribution to \(W\) itself, thereby inducing a low-rank posterior that is singular with respect to the Lebesgue measure. The number of parameters is reduced from \(O(mn)\) to \(O(r(m+n))\), and the PAC-Bayes complexity is tightened from \(\sqrt{mn}\) to \(\sqrt{r(m+n)}\). On MLP/LSTM/Transformer architectures, the method achieves OOD detection performance surpassing 5-member Deep Ensembles with \(33\times\) fewer parameters.