CVPR2025 Optimization & Theory AI paper notes paper summaries Federated Learning Adversarial Robustness Compression Model Compression Few-/Zero-Shot Learning Multimodal/VLM

📐 Optimization & Theory¶

📷 CVPR2025 · 11 paper notes

📌 Same area in other venues: 📷 CVPR2026 (22) · 🔬 ICLR2026 (222) · 🧪 ICML2026 (88) · 🤖 AAAI2026 (21) · 🧠 NeurIPS2025 (126) · 📹 ICCV2025 (7)

🔥 Top topics: Federated Learning ×4 · Adversarial Robustness ×2

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression: Proposed the GETA framework to achieve automatic joint structured pruning and quantization-aware training: Quantization-Aware Dependency Graph (QADG) constructs a generic pruning search space + partially projected SGD guarantees layer-wise bit-width constraints + an interpretable joint learning strategy, achieving competitive or state-of-the-art compression performance on both CNNs and Transformers.
Conformal Prediction for Zero-Shot Models: Applying conformal prediction to zero-shot models to provide theoretically guaranteed uncertainty quantification and calibrated prediction sets for models like CLIP.
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World: GlobustVP introduces convex relaxation techniques to the Manhattan World vanishing point estimation problem for the first time. By formulating the joint estimation of vanishing point locations and line-to-VP associations as a QCQP and relaxing it into an SDP, it achieves a globally optimal and highly efficient solver (~50ms/image) robust to up to 70% outliers.
Federated Learning with Domain Shift Eraser: This paper proposes the FDSE method, which decomposes each network layer into a domain-free feature extractor (DFE, globally aggregated to enhance consensus) and a domain-specific shift eraser (DSE, personalized aggregated to retain local characteristics). Combined with BN consistency regularization, it achieves 76.77% on DomainNet (outperforming Ditto by 1.6%) and 91.58% on Office-Caltech10 (outperforming FedBN by 4.6%).
How to Merge Your Multimodal Models Over Time?: This paper proposes the TIME (Temporal Integration of Model Expertise) framework to systematically study the progressive merging of multimodal expert models over time. By defining a search space across three axes—initialization strategy, deployment strategy, and merging technique, the work uncovers key design principles for temporal model merging on the FoMo-in-Flux benchmark.
Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning: This paper proposes TABASCO, a two-stage two-dimensional sample selection framework to address federated semi-supervised learning under joint label noise and long-tailed distributions. It utilizes two complementary metrics, Weighted JSD (WJSD) and Adaptive Centroid Distance (ACD), to identify clean samples. After GMM clustering, the remaining noisy data is leveraged in a semi-supervised manner, achieving 85.53% accuracy on CIFAR-10 (0.1 imbalance + 0.4 noise).
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency: This work identifies that existing model poisoning attacks in federated learning cancel each other out due to cross-round directional inconsistency. It proposes PoisonedFL, which achieves a multi-round consistent attack through a fixed random direction vector, dynamic magnitude adjustment, and a hypothesis testing mechanism, bypassing 8 SOTA defenses without requiring any real client information.
SCOPE: Semantic Coreset with Orthogonal Projection Embeddings for Federated Learning: SCOPE proposes a semantic coreset selection framework for federated learning. By leveraging zero-shot VLM (MobileCLIP-S2) to extract three scalar metrics (representation score, diversity score, and margin proximity), the server aggregates a global consensus to guide a two-stage pruning process (anomaly filtering + redundancy elimination) on clients. This achieves a 128-512× uplink bandwidth reduction and 7.72× speedup while maintaining competitive accuracy.
Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent: It is discovered that the PGD attack exhibits cyclic behavior on the \(L_\infty\) ball for robust samples. Detecting cycles via hashing (PGD_CD) enables early stopping, which achieves an iteration reduction of up to 96% while maintaining identical robustness evaluation results.
Test-Time Augmentation Improves Efficiency in Conformal Prediction: It is discovered that test-time data augmentation (TTA) can systematically improve the efficiency of conformal prediction. By learning augmentation weights on a calibration set to optimize the augmentation aggregation strategy, the prediction set size is reduced by 10-17% on ImageNet with ResNet-50 while strictly preserving the coverage guarantee.
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory: This paper proposes the MCT (Matching Convexified Trajectory) method. By replacing SGD expert trajectories with a linear convex combination trajectory from random initialization to the optimal point, MCT simultaneously addresses the three major challenges of the traditional MTT method: trajectory instability, slow convergence, and high storage consumption.