🛡️ AI Safety¶

📷 CVPR2026 · 24 paper notes

A Unified Perspective on Adversarial Membership Manipulation in Vision Models: This work is the first to reveal the adversarial membership manipulation vulnerability in membership inference attacks (MIA) against vision models — imperceptible perturbations can forge non-members as members to deceive auditing. It identifies a gradient norm collapse signature in forged members, and proposes a gradient-geometry-based detection strategy (MFD) and an adversarially robust inference framework (AR-MIA).
All Vehicles Can Lie: Efficient Adversarial Defense in Fully Untrusted-Vehicle Collaborative Perception via Pseudo-Random Bayesian Inference: This paper proposes the Pseudo-Random Bayesian Inference (PRBI) framework for collaborative perception scenarios where all vehicles are untrusted. By leveraging inter-frame temporal consistency as a self-referential signal, PRBI employs pseudo-random grouping combined with Bayesian inference to efficiently identify and exclude malicious vehicles at an average cost of only 2.5 validations per frame, recovering detection accuracy to 79.4%–86.9% of the pre-attack baseline.
ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering: This paper proposes ClusterMark, a watermarking method based on visual token clustering for autoregressive image generation models. By assigning semantically similar tokens to the same color set (red/green), ClusterMark substantially improves watermark robustness under image perturbations while preserving image quality and enabling fast verification.
ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering: This paper proposes ClusterMark, a watermarking scheme based on visual token clustering that adapts KGW-style LLM watermarking to autoregressive image generators. By assigning visually similar tokens to the same green/red partition, it significantly improves watermark robustness under image perturbations while preserving image quality.
Computation and Communication Efficient Federated Unlearning via On-server Gradient Conflict Mitigation and Expression: This paper proposes FOUL, a two-stage framework that decouples causal and non-causal features during training and performs on-server gradient conflict matching during unlearning, achieving efficient federated unlearning with low communication overhead without accessing client data.
AdvMark: Decoupling Defense Strategies for Robust Image Watermarking: AdvMark proposes a two-stage decoupled defense framework: Stage 1 Encoder Adversarial Training (EAT) pushes watermarked images into non-attackable regions to resist adversarial attacks; Stage 2 performs direct image optimization to defend against distortion and regeneration attacks while preserving adversarial robustness. Evaluated across 9 watermarking methods × 10 attack types, AdvMark improves distortion/regeneration/adversarial accuracy by 29%/33%/46% respectively, while achieving the best image quality.
Domain-Skewed Federated Learning with Feature Decoupling and Calibration: This paper proposes F²DC, a framework that employs a Domain Feature Decoupler (DFD) and a Domain Feature Corrector (DFC) to decompose local client features in federated learning into domain-robust features and domain-related features. Rather than discarding the latter, F²DC calibrates them to recover entangled class-discriminative information, and combines this with a domain-aware aggregation strategy. The method consistently outperforms state-of-the-art approaches across three multi-domain datasets.
FecalFed: Privacy-Preserving Poultry Disease Detection via Federated Learning: This paper proposes FecalFed, a privacy-preserving federated learning framework that first removes 46.89% duplicate contamination from public poultry fecal datasets via a dual-hash deduplication pipeline and releases a clean benchmark of 8,770 images (poultry-fecal-fl). Under highly non-IID conditions (Dirichlet α=0.5), FedAdam + Swin-Small recovers accuracy from a collapsed 64.86% (single-farm) to 90.31%, only 4.79% below the centralized upper bound of 95.10%. The edge-optimized Swin-Tiny (28M parameters) still achieves 89.74%, providing an efficient and practical solution for on-farm deployment.
FedAFD: Multimodal Federated Learning via Adversarial Fusion and Distillation: This paper proposes FedAFD, a framework that simultaneously improves model performance for both heterogeneous clients and the server in multimodal federated learning through a three-stage design comprising bi-level adversarial alignment, granularity-aware feature fusion, and similarity-guided ensemble distillation.
FedDAP: Domain-Aware Prototype Learning for Federated Learning under Domain Shift: This paper proposes FedDAP, a domain-aware prototype federated learning framework that addresses global model performance degradation caused by client-side domain shift in federated learning. FedDAP constructs domain-specific global prototypes and employs a dual prototype alignment strategy comprising intra-domain alignment and cross-domain contrastive learning.
Federated Active Learning Under Extreme Non-IID and Global Class Imbalance: This paper systematically analyzes the impact of global class imbalance and client heterogeneity on query model selection in federated active learning (FAL), derives three core Observations, and proposes FairFAL—a class-fair FAL framework featuring adaptive query model selection, prototype-guided pseudo-labeling, and two-stage uncertainty-diversity balanced sampling—consistently outperforming all baselines across five benchmark datasets.
Federated Active Learning Under Extreme Non-IID and Global Class Imbalance: This paper systematically investigates the query model selection problem in federated active learning (FAL), identifies class-balanced sampling as the key performance factor, and proposes FairFAL — a framework achieving fair and efficient FAL via adaptive model selection, prototype-guided pseudo-labeling, and uncertainty-diversity balanced sampling.
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning: This paper proposes FedRE, a framework that achieves a three-way balance among performance, privacy protection, and communication overhead in model-heterogeneous federated learning via "entangled representations"—aggregating all local representations of each client into a single cross-class representation using normalized random weights.
Generative Adversarial Perturbations with Cross-paradigm Transferability on Localized Crowd Counting: This paper proposes CrowdGen, the first cross-paradigm adversarial attack framework targeting both density-map and point-regression crowd counting models. A lightweight UNet generator combined with a multi-task loss (logit suppression, density suppression, GradCAM guidance, and frequency-domain constraint) achieves high transferability (TR up to 1.69) across seven SOTA crowd counting models while maintaining visual imperceptibility (~19 dB PSNR), increasing attack MAE by an average factor of 7×.
LogitDynamics: Reliable ViT Error Detection from Layerwise Logit Trajectories: LogitDynamics attaches lightweight classification heads to each layer of a ViT to extract layerwise logit trajectories and top-K competition dynamics, then trains a linear probe to predict model errors, outperforming existing methods in cross-dataset generalization.
Monte Carlo Stochastic Depth for Uncertainty Estimation in Deep Learning: This paper formally connects Stochastic Depth (SD) to the Bayesian variational inference framework, proposes Monte Carlo Stochastic Depth (MCSD) as an uncertainty estimation method, and conducts the first systematic benchmark on modern detectors including YOLO and RT-DETR, demonstrating competitive performance against MC Dropout in terms of calibration and uncertainty ranking.
One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control: O2MAG proposes a training-free few-shot anomaly generation method that synthesizes diverse and realistic anomalies from a single reference anomaly image via a tri-branch diffusion process with self-attention grafting (TriAG). It incorporates Anomaly Guidance Optimization (AGO) to align textual semantics and Dual Attention Enhancement (DAE) to ensure complete mask-region filling. The method significantly outperforms existing approaches on downstream anomaly detection benchmarks using MVTec-AD.
ProxyFL: A Proxy-Guided Framework for Federated Semi-Supervised Learning: ProxyFL is proposed as a framework that leverages classifier weights as unified proxies to simultaneously mitigate external heterogeneity (cross-client distribution discrepancy) and internal heterogeneity (distribution mismatch between labeled and unlabeled data) in federated semi-supervised learning, achieving substantial improvements over existing FSSL methods across multiple benchmarks.
RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces: This paper proposes RecoverMark, a robust watermarking framework that embeds facial content itself as a watermark into the background region, simultaneously achieving tampering localization, original content recovery, and copyright verification while remaining effective under watermark removal attacks.
SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport: This paper proposes SubFLOT, a framework that leverages Optimal Transport (OT) on the server side to align the parameter distributions of a global model with clients' historical models, enabling personalized pruning without access to raw data. Combined with an adaptive regularization mechanism to suppress pruning-induced parameter drift, SubFLOT substantially outperforms existing federated pruning methods across multiple datasets.
TIACam: Text-Anchored Invariant Feature Learning with Auto-Augmentation for Camera-Robust Zero-Watermarking: This paper proposes TIACam, a framework that simulates camera distortions via a learnable auto-augmentor, learns invariant features through text-anchored cross-modal adversarial training, and binds binary messages to features via a zero-watermarking head—achieving camera-robust zero-watermarking without modifying any image pixels. TIACam attains state-of-the-art bit accuracy across three real-world scenarios: screen recapture, print-and-scan, and screenshot.
Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction: This paper proposes SADCA (Semantic-Augmented Dynamic Contrastive Attack), which iteratively disrupts cross-modal semantic consistency between adversarial images and texts via a dynamic contrastive interaction mechanism and a semantic augmentation module. SADCA significantly improves adversarial transferability against vision-language pre-training (VLP) models, surpassing existing SOTA methods in both cross-model and cross-task attack settings.
Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection: This paper proposes the Tutor-Student Reinforcement Learning (TSRL) framework, which formulates the training process of a deepfake detector as a Markov Decision Process. A "tutor" (PPO agent) dynamically assigns loss weights to individual samples based on their visual features and historical learning dynamics (EMA loss, forgetting count). A "state-change" reward signal guides the "student" (detector) to prioritize high-value samples, substantially improving generalization in cross-dataset and cross-method evaluations.
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models: This paper proposes the UPA-RFAS framework, which learns a single physical adversarial patch to achieve universal, transferable black-box attacks against VLA robot policies through a combination of feature-space displacement, attention hijacking, and semantic misalignment.