⚖️ Alignment & RLHF¶
📹 ICCV2025 · 2 paper notes
📌 Same area in other venues: 📷 CVPR2026 (12) · 🔬 ICLR2026 (102) · 💬 ACL2026 (38) · 🧪 ICML2026 (37) · 🤖 AAAI2026 (17) · 🧠 NeurIPS2025 (36)
- Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
-
This paper proposes HIMRD, a black-box multimodal jailbreak attack method that bypasses unimodal safety mechanisms by distributing malicious semantics across multiple modalities. A heuristic search strategy is employed to identify optimal understanding-enhancing prompts and inducing prompts, achieving average attack success rates of approximately 90% and 68% on open-source and closed-source multimodal large language models, respectively.
- MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
-
This paper proposes MagicID, a framework that constructs hybrid video pair data capturing identity and dynamic preferences, and designs a two-stage Hybrid Preference Optimization (HPO) training strategy. MagicID is the first work to apply DPO to identity-customized video generation, simultaneously addressing identity degradation and motion weakening caused by conventional self-reconstruction training.