Skip to content

📐 Optimization & Theory

🤖 AAAI2026 · 24 paper notes

A Distributed Asynchronous Generalized Momentum Algorithm Without Delay Bounds

This paper proposes a totally asynchronous Generalized Momentum (GM) distributed optimization algorithm that guarantees linear convergence without assuming any upper bound on communication or computation delays. On a Fashion-MNIST classification task, the proposed method requires 71% fewer iterations than gradient descent, 41% fewer than Heavy Ball, and 19% fewer than Nesterov accelerated gradient.

A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication

This paper presents a unified convergence analysis framework to systematically compare, for the first time, two server-to-device communication primitives in semi-decentralized federated learning — S2S (returning the aggregated model only to sampled devices) and S2A (broadcasting to all devices). The analysis reveals distinct regimes in which S2S is superior under high inter-component heterogeneity and S2A is superior under low heterogeneity, and provides practical guidelines for system configuration.

BeeRNA: Tertiary Structure-Based RNA Inverse Folding Using Artificial Bee Colony

This paper proposes BeeRNA, which applies the Artificial Bee Colony (ABC) optimization algorithm to the RNA tertiary structure inverse folding problem. Through a two-stage fitness evaluation combining base-pair distance pre-screening and RMSD scoring, BeeRNA outperforms deep learning methods gRNAde and RiboDiffusion on short-to-medium-length RNAs (<100 nt).

Beyond the Mean: Fisher-Orthogonal Projection for Natural Gradient Descent in Large Batch Training

This paper proposes Fisher-Orthogonal Projection (FOP), which supplements variance information by orthogonally projecting sub-batch gradient differences under the Fisher metric, enabling the second-order optimizer KFAC to remain effective in ultra-large batch training and achieving up to ×7.5 speedup.

Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation

This paper proposes EvoReal, a framework that employs LLM-driven evolutionary search to generate synthetic VRP instances structurally aligned with real-world distributions, and then adapts pretrained neural solvers to real benchmarks via a two-stage progressive fine-tuning strategy. EvoReal substantially outperforms existing neural solvers on TSPLib (1.05% gap) and CVRPLib (2.71% gap).

Co-Layout: LLM-driven Co-optimization for Interior Layout

This paper proposes Co-Layout, a framework that leverages LLMs to extract structured constraints from natural language descriptions, then jointly optimizes room layout and furniture placement via a grid-based integer programming (IP) formulation augmented with a coarse-to-fine solving strategy, substantially outperforming existing two-stage approaches.

Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

This paper integrates the Median of Means (MoM) estimator into the convex clustering framework, proposing the COMET algorithm. By combining random binning with median aggregation, COMET achieves robustness to noise and outliers without requiring prior knowledge of the number of clusters \(k\). Weak consistency is established theoretically, and experiments on multiple real-world datasets demonstrate substantial improvements over six baselines, including k-means, MoM k-means, and convex clustering.

Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

This paper proposes COMET (Convex Clustering with Median of Means Estimator), which integrates the Median of Means (MoM) estimator into the convex clustering framework. Through random binning, truncated distance regularization, and ADAM optimization, COMET achieves robust clustering under noise and outliers without requiring a pre-specified number of clusters. The paper establishes weak consistency guarantees theoretically and demonstrates comprehensive superiority over existing methods on both synthetic and real-world datasets.

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

This work provides the first theoretical analysis of the minimum cost required to steer an LLM policy toward an attacker's target by flipping preference labels during RLHF/DPO alignment. The problem is formalized as a convex optimization problem, upper and lower bounds on the cost are derived, and a post-processing method called PCM (Poisoning Cost Minimization) is proposed to substantially reduce the number of label flips while preserving the poisoning effect.

Data Heterogeneity and Forgotten Labels in Split Federated Learning

This paper systematically investigates catastrophic forgetting (CF) caused by data heterogeneity in Split Federated Learning — with particular focus on intra-round forgetting induced by the server-side processing order — and proposes Hydra, a multi-head method that partitions and trains the final layers of part-2 in groups before aggregation, significantly reducing the performance gap (PG) across labels by up to 75.4%.

ECPv2: Fast, Efficient, and Scalable Global Optimization of Lipschitz Functions

This paper proposes ECPv2, which introduces three innovations—adaptive lower bound, Worst-\(m\) memory, and fixed random projection—to reduce the per-run complexity of Lipschitz global optimization from \(\Omega(n^2 d)\) to \(\Omega(n(m+d)\log n)\), while maintaining an \(O(n^{-1/d})\) regret convergence rate that matches the minimax lower bound.

Efficient and Reliable Hitting-Set Computations for the Implicit Hitting Set Approach

To address numerical instability arising from the reliance on commercial IP solvers in the hitting-set component of the implicit hitting set (IHS) framework, this paper proposes alternative approaches based on pseudo-Boolean (PB) reasoning and stochastic local search (SLS), as well as hybrid strategies. The work realizes the first certifiable IHS computation and demonstrates effective trade-offs between efficiency and reliability across 1,786 benchmark instances.

FedP²EFT: Federated Learning to Personalize PEFT for Multilingual LLMs

This paper proposes FedP²EFT, which collaboratively trains a Personalization Strategy Generator (PSG) via federated learning to automatically generate personalized LoRA rank structures for each client, substantially outperforming manually designed PEFT configurations and existing FL personalization methods in multilingual LLM fine-tuning.

FedPM: Federated Learning Using Second-order Optimization with Preconditioned Mixing of Local Parameters

This paper proposes FedPM (Federated Preconditioned Mixing), a novel federated learning method that replaces the conventional simple parameter averaging on the server side with "preconditioned mixing," addressing the local preconditioner drift problem inherent in existing second-order federated optimization methods. FedPM is theoretically shown to achieve superlinear convergence for strongly convex objectives and empirically outperforms existing methods under heterogeneous data settings.

GHOST: Solving the Traveling Salesman Problem on Graphs of Convex Sets

This paper proposes GHOST, a hierarchical optimal search algorithm for solving the Traveling Salesman Problem on Graphs of Convex Sets (GCS). By combining combinatorial path search with convex trajectory optimization, and employing a novel abstract path expansion algorithm to compute admissible lower bounds that guide best-first search, GHOST achieves optimality guarantees while outperforming the monolithic mixed-integer convex programming (MICP) baseline by orders of magnitude in runtime.

Instance Generation for Meta-Black-Box Optimization through Latent Space Reverse Engineering

This paper proposes the LSRE framework, which constructs a two-dimensional latent instance space for BBO problems via an autoencoder, and employs enhanced genetic programming to reverse-engineer diverse synthetic optimization problem instances from this space, forming the Diverse-BBO benchmark that substantially improves the generalization performance of MetaBBO methods.

MOTIF: Multi-strategy Optimization via Turn-based Interactive Framework

This paper proposes MOTIF, a framework that models solver design as a multi-strategy optimization problem. Through a turn-based competitive mechanism involving two LLM agents guided by Monte Carlo Tree Search (MCTS), MOTIF jointly optimizes multiple interdependent algorithmic components within combinatorial optimization solvers, consistently outperforming existing methods across TSP, CVRP, BPP, and other combinatorial optimization domains.

On the Learning Dynamics of Two-Layer Linear Networks with Label Noise SGD

This paper theoretically analyzes the learning dynamics of label noise SGD on two-layer overparameterized linear networks, revealing a two-phase behavior: in Phase I, weight norms progressively diminish, enabling the model to escape the lazy regime and enter the rich regime; in Phase II, weights align with the ground-truth interpolator and converge. The theory is further extended to the SAM optimizer.

Parametrized Multi-Agent Routing via Deep Attention Models

This paper proposes the Deep FLPO framework, which integrates the algebraic structure of the Maximum Entropy Principle (MEP) with a permutation-invariant encoder-decoder neural network (SPN) to address the NP-hard mixed-integer problem of joint facility location and routing optimization. The framework achieves a 100× speedup in policy inference while matching Gurobi's exact solutions at 1500× faster runtime.

Pareto-Grid-Guided Large Language Models for Fast and High-Quality Heuristics Design in Multi-Objective Combinatorial Optimization

This paper proposes MPaGE, a framework that integrates LLMs with a Pareto Front Grid mechanism and semantic clustering to automatically generate heuristics for multi-objective combinatorial optimization problems (MOCOPs), jointly optimizing solution quality and runtime efficiency. MPaGE achieves significant improvements in HV and IGD over baselines such as EoH and MEoH on Bi-TSP, Tri-TSP, Bi-CVRP, and Bi-KP benchmarks.

PEOAT: Personalization-Guided Evolutionary Question Assembly for One-Shot Adaptive Testing

This paper is the first to propose the One-shot Adaptive Testing (OAT) task, formulating it as a combinatorial optimization problem, and introduces the PEOAT framework—combining personalized initialization, cognitively enhanced evolutionary search, and diversity-preserving selection—to select an optimal question set for each examinee in a single shot without interactive feedback, substantially outperforming conventional CAT methods.

pFed1BS: Personalized Federated Learning with Bidirectional Communication Compression via One-Bit Random Sketching

This paper proposes pFed1BS, a framework that achieves extreme bidirectional communication compression (>99% reduction) in federated learning via one-bit random sketching, while introducing a sign-based regularizer for client model personalization. The framework simultaneously addresses communication bottlenecks and data heterogeneity in non-IID settings.

SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data

This paper proposes SMoFi, a framework that synchronizes the momentum buffers of surrogate models on the server side at every SGD step within Split FL, effectively mitigating gradient divergence caused by non-IID data. SMoFi achieves up to 7.1% accuracy improvement and up to 10.25× convergence speedup.

Tackling Resource-Constrained and Data-Heterogeneity in Federated Learning with Double-Weight Sparse Pack

This paper proposes FedCSPACK, a personalized federated learning method based on cosine-similarity-guided sparse parameter packing and double-weight aggregation. By performing parameter selection and sharing at the pack level, FedCSPACK simultaneously addresses data heterogeneity and client resource constraints, achieving 2–5× faster training, up to 96% reduction in communication overhead, and a 3.34% improvement in model accuracy.