Skip to content

📐 Optimization & Theory

📷 CVPR2026 · 22 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (222) · 🧪 ICML2026 (88) · 🤖 AAAI2026 (21) · 🧠 NeurIPS2025 (121) · 📹 ICCV2025 (7)

🔥 Top topics: Federated Learning ×4 · Compression ×2 · Diffusion Models ×2 · Adversarial Robustness ×2

ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation

This paper theoretically proves that fine-tuned parameter differences contain input covariance information. Accordingly, it proposes ACE-Merging, which achieves data-free closed-form model merging through a three-step process: adaptive covariance estimation, collective structure priors, and spectral refinement. It achieves an average improvement of 4% on GPT-2 and 5% on RoBERTa-Base compared to previous methods.

BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

The authors propose the BD-Merging framework, which utilizes Dirichlet evidence modeling, Neighborhood Disparity Score (ADS), and disparity-aware contrastive learning to train a debiasing router for adaptive assignment of model merging weights. This significantly improves the robustness and generalization of merged models under test-time distribution shifts and unseen tasks.

Beyond Single Solution: Multi-Hypothesis Collaborative Deep Unfolding Network for Image Compressive Sensing

Addressing the "underdetermined and non-unique" nature of the Compressive Sensing (CS) problem, this paper proposes MHC-DUN: a paradigm shift from reconstructing a single solution in traditional Deep Unfolding Networks (DUNs) to "reconstructing \(T\) hypothesis solutions simultaneously with collaborative optimization." Specifically, AlphaNet predicts pixel-adaptive step sizes for each hypothesis in the gradient descent step, while MHCB captures inter-hypothesis correlations for fusion in the proximal mapping step. The method consistently outperforms current SOTA on Set11/Urban100/CS-MRI (e.g., achieving a 0.45 dB average PSNR gain over USB-Net on Set11).

Conditional Factuality Controlled LLMs with Generalization Certificates via Conformal Sampling

Ours proposes CFC (Conditional Factuality Control), a post-hoc conformal framework that learns feature-conditioned acceptance thresholds via augmented quantile regression. It provides conditional coverage guarantees for LLM/VLM sampled outputs, significantly improving reliability for difficult subgroups while maintaining compact prediction sets.

DABO: Difficulty-Aware Bayesian Optimization with Diffusion-Learned Priors

DABO treats "optimization difficulty" as a first-class conditional variable throughout the entire freeze-thaw hyperparameter optimization (HPO) pipeline. By utilizing a three-level difficulty characterization and a conditional diffusion model to generate 1 million synthetic learning curves with difficulty labels, it trains a difficulty-aware PFN proxy and an adaptive acquisition function. DABO achieves an average regret reduction of 11–18% compared to the current SOTA (ifBO) across 75 tasks, with greater gains observed on harder tasks.

DC-Merge: Improving Model Merging with Directional Consistency

DC-Merge discovers that the key to model merging lies in maintaining directional consistency in singular space between the merged multi-task vector and the original single-task vectors. Through a two-step process of singular value smoothing and projection onto a shared orthogonal subspace, it achieves SOTA results on both Vision and Vision-Language tasks.

Defending Unauthorized Model Merging via Dual-Stage Weight Protection

Ours proposes MergeGuard, an active dual-stage weight protection framework: Stage 1 disperses task-critical weights through L2 regularization, and Stage 2 injects structured perturbations to disrupt merging compatibility. It maintains <1.5% original performance loss for the protected model while causing up to 90% accuracy degradation in merged models.

Dynamic Momentum Recalibration in Online Gradient Learning

This paper reveals the inherent flaws of fixed momentum coefficients in the bias-variance tradeoff from a signal processing perspective. It proposes the SGDF optimizer, which dynamically balances noise suppression and signal preservation in gradient estimation by calculating an optimal time-varying gain online (based on the Minimum Mean Square Error principle), outperforming SGD with momentum and Adam variants across various vision tasks.

End-to-End Hyper-Relational Information Extraction for Engineering Diagrams via Dynamically Tokenized Relation Transformer

This work reframes the parsing of engineering diagrams (P&ID, Electrical Diagrams) from a multi-model workflow of detecting symbols, lines, and text separately into a one-time scene graph generation task. By employing a vision backbone with dynamic token pruning and a one-stage Relation Transformer (DTRT), the system end-to-end outputs a Hyper-Relational Knowledge Graph (HKG) containing "entities + connectivity + text qualifiers." On P&ID datasets, it achieves 94.84% SGDET R@2000 with approximately 1/8 the computational cost of two-stage methods.

Enhancing Visual Representation with Textual Semantics: Textual Semantics-Powered Prototypes for Heterogeneous Federated Learning

Addressing the issue where existing Federated Prototypical Learning methods destroy inter-class semantic relations, the proposed FedTSP method utilizes pre-trained language models to construct textual prototypes that preserve semantic structures, significantly improving performance and accelerating convergence in heterogeneous federated learning.

Fed-ADE: Adaptive Learning Rate for Federated Post-adaptation under Distribution Shift

Proposes the Fed-ADE framework, which utilizes two lightweight distribution shift signals—uncertainty dynamics estimation and representation dynamics estimation—to adaptively adjust the learning rate for each client at each time step, achieving unsupervised post-deployment adaptation in federated learning.

Few-for-Many Personalized Federated Learning

The study reformulates Personalized Federated Learning (PFL) as a "few-for-many" (\(K \ll M\)) multi-objective optimization problem, where \(K\) shared models serve \(M\) clients. By employing differentiable Smooth Tchebycheff Set Scalarization (STCH-Set) for joint training, the method stably outperforms existing approaches across vision, NLP, and medical imaging datasets using as few as 3 models.

Globscope: Toward a Global View of the Loss Landscape

This work utilizes a reversible autoencoder to compress a set of independently trained networks (each flattened into a parameter vector) into a 2D latent space. A topological analysis (merge tree) is then performed on this latent space by treating "loss" as a scalar field. This provides the first global loss landscape visualization capable of accommodating multiple minima/basins and their connectivity, successfully reproducing theoretical phenomena such as mode connectivity and permutation symmetry (re-basin).

GR-Gauge: Cost-efficient Training Configuration By Gauging the Gradient Redundancy

Training is modeled as a "gradient voting process in both time and sample dimensions." The authors propose gradient redundancy metrics \(GR_T\) and \(GR_S\) as a universal "health gauge" across models. This gauge guides hyperparameter search, early stopping, and state reuse for learning rate and batch size, reducing the total time to reach target accuracy by up to 80%+ without requiring expensive validation sets.

HFedATM: Hierarchical Federated Domain Generalization via Optimal Transport and Regularized Mean Aggregation

This paper formally defines "Hierarchical Federated Domain Generalization (HFedDG)" for the first time and derives a generalization error bound decomposed into three levels: client, station, and server. It proposes HFedATM—a data-free, plug-and-play method that modifies only the server aggregation step. It utilizes Filter-wise Optimal Transport (FOT) to align convolutional filters across stations and Shrinkage-aware RegMean for closed-form fusion of linear layers. HFedATM consistently improves baselines such as FedAvg, FedProx, FedSR, and FedIIR across vision and NLP benchmarks.

Label-Free Cross-Task LoRA Merging with Null-Space Compression

It is observed that the null-space ratio of the down-projection matrix A during LoRA fine-tuning decreases and strongly correlates with performance. Based on this, NSC Merging is proposed—a label-free, task-agnostic LoRA merging method that achieves SOTA results on 20 heterogeneous vision tasks, 6 NLI tasks, and VLM evaluations.

Learning to Learn Weight Generation via Local Consistency Diffusion

Mc-Di combines the bi-level optimization of meta-learning with diffusion-based weight generation and transforms the diffusion process from learning only "globally optimal weights" to "local consistency diffusion." By reconstructing weights segmentally along multiple intermediate points on the optimization trajectory, the model achieves higher accuracy and lower inference latency in tasks requiring frequent weight updates, such as transfer learning, few-shot learning, domain generalization, and language model fine-tuning.

Mapping Networks

This paper proposes Mapping Networks—a "meta-parameterization" method that utilizes a low-dimensional trainable latent vector \(z\) (coupled with fixed mapping weights modulated by \(z\)) to generate all parameters of a target network. By shifting the training process from a high-dimensional weight space to a low-dimensional latent space, the method achieves or exceeds the accuracy of the original network on tasks such as image classification, deepfake detection, and segmentation with approximately 500× fewer trainable parameters, while significantly suppressing overfitting.

Model Merging in the Essential Subspace

The ESM framework is proposed to construct an "essential subspace" by performing PCA on activation offsets caused by parameter updates (rather than directly applying SVD to parameters). It utilizes three-level polarization scaling to enhance key parameters and suppress noise, achieving a 3.2% absolute accuracy improvement over Iso-CTS in a 20-task merging scenario with ViT-B/32.

Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score

The SemiCP framework is proposed to integrate unlabeled data into the conformal prediction calibration process via Nearest Neighbor Matching (NNM) scores. This reduces the average coverage gap by up to 77% when labeled data is extremely scarce, while simultaneously shrinking the size of prediction sets.

The Power of Decaying Steps: Enhancing Attack Stability and Transferability for Sign-based Optimizers

This work refactors sign-based adversarial attack optimizers into coordinate-wise gradient descent, revealing that non-decaying step sizes are the root cause of non-convergence and instability. It proposes the Monotone Decreasing Coordinate Step (MDCS) strategy and theoretically proves that MDCS-MI achieves an optimal \(O(1/\sqrt{T})\) convergence rate. MDCS significantly enhances the attack transferability and stability across image classification and cross-modal retrieval tasks.

UniFusion: A Unified Image Fusion Framework with Robust Representation and Source-Aware Preservation

The authors propose UniFusion, a unified image fusion framework that leverages DINOv3 self-supervised semantic priors to construct a cross-modal shared feature space. It preserves source image information via a reconstruction alignment mechanism and decouples reconstruction and fusion objectives using a bilevel optimization strategy, achieving SOTA performance across tasks such as infrared-visible, multi-exposure, multi-focus, and medical image fusion.