📐 Optimization & Theory¶

📹 ICCV2025 · 8 paper notes

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer: This paper proposes SimVQ, a method that reparameterizes codebook vectors via a single learnable linear transformation layer (\(\bm{C}\bm{W}\)), converting the disjoint optimization of the codebook into a joint spatial optimization, thereby fundamentally resolving representation collapse in VQ models and achieving near-100% codebook utilization.
Class-Wise Federated Averaging for Efficient Personalization: cwFedAvg extends FedAvg from client-level aggregation to class-level aggregation, constructing a dedicated global model per class and combining them into a personalized local model weighted by each client's class distribution. Coupled with Weight Distribution Regularization (WDR) to strengthen the alignment between class distribution and weight norms, the method achieves substantial personalization gains under non-IID settings while maintaining the same communication overhead as FedAvg.
Cooperative Pseudo Labeling for Unsupervised Federated Classification: FedCoPL is the first work to extend unsupervised federated learning (UFL) to classification tasks. It addresses CLIP's inherent bias and label shift challenges via a cooperative pseudo labeling strategy (global assignment ensuring class balance) and a partial prompt aggregation protocol (aggregating only visual prompts while keeping text prompts local).
Federated Continual Instruction Tuning: This paper introduces the first Federated Continual Instruction Tuning (FCIT) benchmark, covering 2 scenarios, 4 settings, and 12 datasets, and proposes the DISCO framework, which addresses data heterogeneity and catastrophic forgetting via Dynamic Knowledge Organization (DKO) and Subspace Selective Activation (SSA).
Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data: This paper proposes FED-PRIME, a federated prompt-tuning framework for multimodal settings with missing modalities. It maintains two sets of learnable prompts — inter-client and intra-client — to capture cross-client alignable missing patterns and client-specific missing patterns, respectively, and employs a clustering-alignment mechanism for server-side aggregation. FED-PRIME substantially outperforms existing baselines across diverse missing-data configurations.
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit: This paper parameterizes the query dictionary of Information Pursuit (IP) as learnable vectors in the CLIP semantic embedding space, and learns a task-sufficient interpretable query dictionary via an alternating optimization algorithm, substantially closing the performance gap between interpretable classifiers and black-box classifiers.
Memory-Efficient 4-bit Preconditioned Stochastic Optimization: This paper proposes a 4-bit quantization scheme based on Cholesky decomposition and error feedback, compressing the preconditioner matrices of the Shampoo optimizer to 4-bit precision. The approach substantially reduces GPU memory consumption while preserving training performance close to 32-bit Shampoo, with convergence guarantees provided for both smooth and non-smooth settings.
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces: This paper proposes SubZero (random Subspace Zeroth-order), which estimates gradients in random subspaces via per-layer low-rank perturbations, significantly reducing gradient variance and angular error in zeroth-order optimization, enabling memory-efficient LLM fine-tuning at a cost close to inference.