� LLM Safety¶

📹 ICCV2025 · 8 paper notes

Adversarial Robust Memory-Based Continual Learner: This paper identifies two compounding challenges when combining continual learning with adversarial training—accelerated forgetting and gradient confusion—and proposes two plug-and-play modules, Anti-Forgettable Logit Calibration (AFLC) and Robustness-Aware Experience Replay (RAER), achieving up to 8.13% improvement in adversarial robustness on Split-CIFAR10/100 and Split-Tiny-ImageNet.
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset: This paper proposes UEvs, the first unlearnable example generation method for asynchronous event data. It introduces Event Error-Minimizing Noise (E²MN) and an adaptive projection mechanism that prevent unauthorized models from learning from event datasets while preserving utility for legitimate use.
ChartCap: Mitigating Hallucination of Dense Chart Captioning: This work constructs ChartCap, a large-scale dataset of 565K real chart–caption pairs. By adopting type-specific caption schemas that exclude irrelevant information while emphasizing structure and key insights, and by introducing a reference-free Visual Consistency Score (VCS) evaluation metric, the paper effectively mitigates hallucination in VLM-based chart captioning.
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling: This paper proposes Gradient-Guided Sampling (GGS), an inner-iteration sampling strategy that uses the gradient direction from the previous inner iteration to guide sampling. By striking a balance between Exploitation (attack strength / loss maxima) and Exploration (cross-model generalization / flat loss landscape), GGS significantly outperforms existing transfer attack methods across diverse architectures including CNNs, ViTs, and MLLMs.
Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation: This paper proposes FUCRT, a federated unlearning method based on class-aware representation transformation. Rather than directly erasing the representations of forget classes, FUCRT transforms them toward the semantically nearest retain classes, and employs dual contrastive learning to align transformation consistency across clients. The method guarantees 100% unlearning on four datasets while maintaining or even improving performance on retain classes.
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning: This paper proposes Geminio, the first gradient inversion attack (GIA) leveraging vision-language models (VLMs) to enable natural language-guided targeted reconstruction. A malicious server can specify the type of data to steal via natural language queries, precisely locating and reconstructing semantically matching private samples from large-batch gradients, without disrupting normal FL model training.
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis: This paper proposes Oasis, a method that induces MLLMs to autoregressively generate high-quality multimodal instruction-following data using only an input image (without any text prompt). Combined with a fine-grained instruction quality control mechanism, synthesizing 500K samples yields an average 3.1% overall performance gain for LLaVA-NeXT, surpassing other data synthesis methods.
Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation: This paper presents the first study on preventing video data from being exploited by deep trackers without authorization. It proposes a DiT-based generative framework for producing Temporal Unlearnable Examples (TUE), employing a temporal contrastive loss to induce trackers to rely on perturbation noise for temporal matching rather than learning genuine data structure. The method achieves strong transferability across models, datasets, and tasks.