Skip to content

⚖️ Alignment & RLHF

📹 ICCV2025 · 2 paper notes

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

This paper proposes HIMRD, a black-box multimodal jailbreak attack method that bypasses unimodal safety mechanisms by distributing malicious semantics across multiple modalities. A heuristic search strategy is employed to identify optimal understanding-enhancing prompts and inducing prompts, achieving average attack success rates of approximately 90% and 68% on open-source and closed-source multimodal large language models, respectively.

MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization

This paper proposes MagicID, a framework that constructs hybrid video pair data capturing identity and dynamic preferences, and designs a two-stage Hybrid Preference Optimization (HPO) training strategy. MagicID is the first work to apply DPO to identity-customized video generation, simultaneously addressing identity degradation and motion weakening caused by conventional self-reconstruction training.