Skip to content

🔒 LLM Safety

📹 ICCV2025 · 10 paper notes

📌 Same area in other venues: 📷 CVPR2026 (12) · 🔬 ICLR2026 (185) · 💬 ACL2026 (115) · 🤖 AAAI2026 (41) · 🧠 NeurIPS2025 (81) · 🧪 ICML2025 (41)

🔥 Top topics: Federated Learning ×4 · Adversarial Robustness ×3 · Multimodal/VLM ×2

Adversarial Robust Memory-Based Continual Learner

This paper identifies two compounding challenges when combining continual learning with adversarial training—accelerated forgetting and gradient confusion—and proposes two plug-and-play modules, Anti-Forgettable Logit Calibration (AFLC) and Robustness-Aware Experience Replay (RAER), achieving up to 8.13% improvement in adversarial robustness on Split-CIFAR10/100 and Split-Tiny-ImageNet.

Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset

This paper proposes UEvs, the first unlearnable example generation method for asynchronous event data. It introduces Event Error-Minimizing Noise (E²MN) and an adaptive projection mechanism that prevent unauthorized models from learning from event datasets while preserving utility for legitimate use.

Cooperative Pseudo Labeling for Unsupervised Federated Classification

FedCoPL is the first work to extend unsupervised federated learning (UFL) to classification tasks. It addresses CLIP's inherent bias and label shift challenges via a cooperative pseudo labeling strategy (global assignment ensuring class balance) and a partial prompt aggregation protocol (aggregating only visual prompts while keeping text prompts local).

Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling

This paper proposes Gradient-Guided Sampling (GGS), an inner-iteration sampling strategy that uses the gradient direction from the previous inner iteration to guide sampling. By striking a balance between Exploitation (attack strength / loss maxima) and Exploration (cross-model generalization / flat loss landscape), GGS significantly outperforms existing transfer attack methods across diverse architectures including CNNs, ViTs, and MLLMs.

Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation

This paper proposes FUCRT, a federated unlearning method based on class-aware representation transformation. Rather than directly erasing the representations of forget classes, FUCRT transforms them toward the semantically nearest retain classes, and employs dual contrastive learning to align transformation consistency across clients. The method guarantees 100% unlearning on four datasets while maintaining or even improving performance on retain classes.

Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning

This paper proposes Geminio, the first gradient inversion attack (GIA) leveraging vision-language models (VLMs) to enable natural language-guided targeted reconstruction. A malicious server can specify the type of data to steal via natural language queries, precisely locating and reconstructing semantically matching private samples from large-batch gradients, without disrupting normal FL model training.

LATTE: Collaborative Test-Time Adaptation of Vision-Language Models in Federated Learning

This paper proposes Latte, a framework that enables collaborative test-time adaptation of vision-language models (e.g., CLIP) in decentralized federated learning settings. Through a dual-memory mechanism combining local and external memory, Latte achieves cross-client knowledge sharing while preserving client-level personalization.

MUNBa: Machine Unlearning via Nash Bargaining

This work formulates Machine Unlearning (MU) as a two-player cooperative bargaining game and derives a closed-form solution via Nash bargaining theory to simultaneously address gradient conflict and gradient dominance between the forgetting and retention objectives, achieving an optimal balance between unlearning and preservation across both classification and generation tasks.

SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders

SAUCE leverages sparse autoencoders (SAEs) to identify and selectively suppress features associated with target concepts in VLM intermediate representations, enabling fine-grained concept unlearning without weight updates. Evaluated across 60 concepts, it surpasses the previous SOTA in forgetting quality by 18%.

Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation

This paper presents the first study on preventing video data from being exploited by deep trackers without authorization. It proposes a DiT-based generative framework for producing Temporal Unlearnable Examples (TUE), employing a temporal contrastive loss to induce trackers to rely on perturbation noise for temporal matching rather than learning genuine data structure. The method achieves strong transferability across models, datasets, and tasks.