🎨 Image Generation¶
💬 ACL2026 · 5 paper notes
📌 Same area in other venues: 📷 CVPR2026 (434) · 🔬 ICLR2026 (357) · 🧪 ICML2026 (141) · 🤖 AAAI2026 (79) · 🧠 NeurIPS2025 (218) · 📹 ICCV2025 (213)
🔥 Top topics: LLM ×3 · Multimodal/VLM ×2
- ANCHOR: LLM-driven Subject Conditioning for Text-to-Image Synthesis
-
This paper proposes the ANCHOR dataset, featuring 70K+ abstract captions from 5 news outlets to expose T2I model failures in multi-subject, contextual reasoning, and fine-grained grounding. It introduces SAFE, which utilizes LLMs to extract key subjects and reinforces subject representations at the embedding layer to enhance image-text consistency.
- From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons
-
This paper proposes FLUID, which efficiently adapts pre-trained autoregressive (AR) LLMs into diffusion-based parallel generation models using strictly causal attention and entropy-aware Elastic Horizons. With only 2.7B adaptation tokens, it achieves reasoning and code generation performance close to strong AR models and superior to existing diffusion baselines.
- MENTOR: Efficient Autoregressive Image Generation with Balanced Multimodal Control
-
MENTOR utilizes a unified autoregressive decoder and two-stage multimodal training to align reference images and text instructions into the same generation prefix. With only 3M training data and a budget of approximately 1.5 days on 8 A100 GPUs, it achieves a superior balance between concept preservation and prompt following.
- Multimodal Large Language Models for Multi-Subject In-Context Image Generation
-
This paper proposes MUSIC, which introduces the visual reasoning capabilities of Multimodal Large Language Models (MLLMs) into multi-subject in-context image generation. Through automated training data synthesis, visual CoT, and semantic-driven spatial layout planning, it significantly mitigates issues of subject omission, identity confusion, and semantic drift when generating multiple reference subjects simultaneously.
- Think Bright, Diffuse Nice: Enhancing T2I-ICL via Inductive-Bias Hint Instruction and Query Contrastive Decoding
-
This paper proposes TBDN, a training-free framework that utilizes Hint Instruction to focus LVLMs on the final query and Query Contrastive Decoding to suppress prior-dominated hallucinations. By delivering more accurate textual descriptions to diffusion models, it significantly improves text-to-image in-context learning performance on CoBSAT and T2I Fast Mini-ImageNet.