Skip to content

📊 LLM Evaluation

📷 CVPR2025 · 4 paper notes

📌 Same area in other venues: 🔬 ICLR2026 (131) · 💬 ACL2026 (97) · 🧪 ICML2026 (40) · 🤖 AAAI2026 (16) · 🧠 NeurIPS2025 (38) · 📹 ICCV2025 (27)

Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways (EraDiff)

This paper proposes EraDiff, which establishes a progressive diffusion pathway from "object-containing" to "pure background" through the Chain-Rectifying Optimization (CRO) paradigm, and suppresses artifacts during sampling using the Self-Rectifying Attention (SRA) mechanism. This enables the diffusion model to truly comprehend the "erasure intention," achieving a SOTA Local FID (3.799) on OpenImages V5 and significantly outperforming SD2-Inpaint and LaMa in complex real-world scenes.

PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation

PosterO is proposed to structure poster layouts into SVG layout trees. By vectorizing design intents and modeling hierarchical node representations, it interfaces with LLMs, generating high-quality content-aware layouts via intent-aligned in-context learning. It achieves state-of-the-art performance across multiple benchmarks and introduces the first PStylish7 dataset supporting multi-purpose and multi-shape elements.

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

This paper proposes RoadSocial, a large-scale and diverse VideoQA dataset sourced from social media (consisting of 13.2K videos and 260K QA pairs) that covers multi-regional and multi-perspective road event scenarios globally. Through a semi-automatic annotation framework and 12 categories of QA tasks, the paper systematically evaluates the road event understanding capabilities of 18 Video LLMs.

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

This paper proposes UniGoal, a unified zero-shot goal-oriented navigation framework. By representing both scenes and goals uniformly as graph structures and combined with a graph matching-driven multi-stage exploration strategy, it achieves zero-shot navigation for three goal types—object categories, instance images, and text descriptions—within a single model, outperforming task-specific methods.