📐 Optimization & Theory¶

📷 CVPR2026 · 10 paper notes

BlazeFL: Fast and Deterministic Federated Learning Simulation: BlazeFL is a lightweight single-machine federated learning simulation framework built on Python free-threading. By combining shared-memory execution with per-client isolated RNG streams, it achieves up to 3.1× speedup and bit-level reproducibility.
Dynamic Momentum Recalibration in Online Gradient Learning: From a signal processing perspective, this work identifies the inherent bias-variance tradeoff deficiencies of fixed momentum coefficients and proposes the SGDF optimizer, which dynamically balances noise suppression and signal preservation in gradient estimation by computing optimal time-varying gains online under the minimum mean squared error principle, outperforming SGD momentum and Adam variants across multiple vision tasks.
Enhancing Visual Representation with Textual Semantics: Textual Semantics-Powered Prototypes for Heterogeneous Federated Learning: To address the problem that existing federated prototype learning methods destroy inter-class semantic relations, this paper proposes FedTSP, which leverages pre-trained language models to construct textual prototypes that preserve semantic structure, achieving significant performance gains and faster convergence in heterogeneous federated learning.
Fed-ADE: Adaptive Learning Rate for Federated Post-adaptation under Distribution Shift: This paper proposes the Fed-ADE framework, which adaptively adjusts the learning rate for each client at each time step using two lightweight distribution shift signals — uncertainty dynamics estimation and representation dynamics estimation — enabling unsupervised post-deployment adaptation in federated settings.
Enhancing Visual Representation with Textual Semantics: Textual Semantics-Powered Prototypes for Heterogeneous Federated Learning: This paper proposes FedTSP, which leverages pre-trained language models (PLMs) to construct semantically rich prototypes from the text modality, preserving inter-class semantic relationships in heterogeneous federated learning. Learnable prompts are introduced to bridge the modality gap, substantially improving model performance and accelerating convergence.
OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport: This work formulates visual token pruning as a distribution alignment problem under optimal transport (OT), minimizing the 2-Wasserstein distance between the full and pruned token sets. It achieves training-free, \(O(mk^2)\)-complexity pruning via Gaussian surrogates, a log-det submodular objective, and greedy Cholesky selection, attaining state-of-the-art accuracy–efficiency trade-offs across 11 multimodal benchmarks.
SCOPE: Semantic Coreset with Orthogonal Projection Embeddings for Federated Learning: This paper proposes SCOPE, a training-free federated coreset selection framework that leverages a frozen VLM (MobileCLIP-S2) with orthogonal projection embeddings to compute three scalar semantic metrics—representativeness, diversity, and boundary proximity—enabling globally-aware two-stage pruning that reduces communication bandwidth by 128–512× while surpassing full-data training.
SCOPE: Semantic Coreset with Orthogonal Projection Embeddings for Federated learning: SCOPE employs a training-free vision-language geometric scorer to compress each sample into three scalars—representativeness, diversity, and negative-class boundary proximity—and has the server aggregate only these lightweight statistics to form a global consensus. This consensus guides each client to first remove semantically anomalous samples and then eliminate majority-class redundancies, thereby achieving a favorable balance among accuracy, robustness, and minimal communication overhead under strongly non-IID and long-tail federated scenarios.
The Power of Decaying Steps: Enhancing Attack Stability and Transferability for Sign-based Optimizers: This paper reformulates sign-based adversarial attack optimizers as coordinate-wise gradient descent, reveals that non-decaying step sizes are the root cause of non-convergence and instability, and proposes a Monotonically Decreasing Coordinate Step-size (MDCS) strategy. Theoretical analysis proves that MDCS-MI achieves the optimal \(O(1/\sqrt{T})\) convergence rate, with significant improvements in attack transferability and stability demonstrated on image classification and cross-modal retrieval tasks.
UniFusion: A Unified Image Fusion Framework with Robust Representation and Source-Aware Preservation: This paper proposes UniFusion, a unified image fusion framework that leverages the self-supervised semantic priors of DINOv3 to construct a cross-modal shared feature space, preserves source image information via a reconstruction alignment mechanism, and decouples reconstruction and fusion objectives through a bilevel optimization strategy. The framework achieves state-of-the-art performance across multiple tasks, including infrared-visible, multi-exposure, multi-focus, and medical image fusion.