Skip to content

📐 Optimization & Theory

📷 CVPR2025 · 11 paper notes

📌 Same area in other venues: 📷 CVPR2026 (22) · 🔬 ICLR2026 (222) · 🧪 ICML2026 (88) · 🤖 AAAI2026 (21) · 🧠 NeurIPS2025 (126) · 📹 ICCV2025 (7)

🔥 Top topics: Federated Learning ×4 · Adversarial Robustness ×2

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Proposed the GETA framework to achieve automatic joint structured pruning and quantization-aware training: Quantization-Aware Dependency Graph (QADG) constructs a generic pruning search space + partially projected SGD guarantees layer-wise bit-width constraints + an interpretable joint learning strategy, achieving competitive or state-of-the-art compression performance on both CNNs and Transformers.

Conformal Prediction for Zero-Shot Models

Applying conformal prediction to zero-shot models to provide theoretically guaranteed uncertainty quantification and calibrated prediction sets for models like CLIP.

Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World

GlobustVP introduces convex relaxation techniques to the Manhattan World vanishing point estimation problem for the first time. By formulating the joint estimation of vanishing point locations and line-to-VP associations as a QCQP and relaxing it into an SDP, it achieves a globally optimal and highly efficient solver (~50ms/image) robust to up to 70% outliers.

Federated Learning with Domain Shift Eraser

This paper proposes the FDSE method, which decomposes each network layer into a domain-free feature extractor (DFE, globally aggregated to enhance consensus) and a domain-specific shift eraser (DSE, personalized aggregated to retain local characteristics). Combined with BN consistency regularization, it achieves 76.77% on DomainNet (outperforming Ditto by 1.6%) and 91.58% on Office-Caltech10 (outperforming FedBN by 4.6%).

How to Merge Your Multimodal Models Over Time?

This paper proposes the TIME (Temporal Integration of Model Expertise) framework to systematically study the progressive merging of multimodal expert models over time. By defining a search space across three axes—initialization strategy, deployment strategy, and merging technique, the work uncovers key design principles for temporal model merging on the FoMo-in-Flux benchmark.

Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning

This paper proposes TABASCO, a two-stage two-dimensional sample selection framework to address federated semi-supervised learning under joint label noise and long-tailed distributions. It utilizes two complementary metrics, Weighted JSD (WJSD) and Adaptive Centroid Distance (ACD), to identify clean samples. After GMM clustering, the remaining noisy data is leveraged in a semi-supervised manner, achieving 85.53% accuracy on CIFAR-10 (0.1 imbalance + 0.4 noise).

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

This work identifies that existing model poisoning attacks in federated learning cancel each other out due to cross-round directional inconsistency. It proposes PoisonedFL, which achieves a multi-round consistent attack through a fixed random direction vector, dynamic magnitude adjustment, and a hypothesis testing mechanism, bypassing 8 SOTA defenses without requiring any real client information.

SCOPE: Semantic Coreset with Orthogonal Projection Embeddings for Federated Learning

SCOPE proposes a semantic coreset selection framework for federated learning. By leveraging zero-shot VLM (MobileCLIP-S2) to extract three scalar metrics (representation score, diversity score, and margin proximity), the server aggregates a global consensus to guide a two-stage pruning process (anomaly filtering + redundancy elimination) on clients. This achieves a 128-512× uplink bandwidth reduction and 7.72× speedup while maintaining competitive accuracy.

Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent

It is discovered that the PGD attack exhibits cyclic behavior on the \(L_\infty\) ball for robust samples. Detecting cycles via hashing (PGD_CD) enables early stopping, which achieves an iteration reduction of up to 96% while maintaining identical robustness evaluation results.

Test-Time Augmentation Improves Efficiency in Conformal Prediction

It is discovered that test-time data augmentation (TTA) can systematically improve the efficiency of conformal prediction. By learning augmentation weights on a calibration set to optimize the augmentation aggregation strategy, the prediction set size is reduced by 10-17% on ImageNet with ResNet-50 while strictly preserving the coverage guarantee.

Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

This paper proposes the MCT (Matching Convexified Trajectory) method. By replacing SGD expert trajectories with a linear convex combination trajectory from random initialization to the optimal point, MCT simultaneously addresses the three major challenges of the traditional MTT method: trajectory instability, slow convergence, and high storage consumption.